Anders Bergkvist, Department of Molecular Biology.
Analysis of sequence alignment covariances

Project background: Göbel et al developed an algorithm to mine sequence alignments for covariations between positions in the alignments in 1994. Subsequently Pazos et al developed an application of Göbel's algorithm to predict neighbouring pairs of amino acids in the interface between two interacting proteins. Inspired by these advances Dr Bergkvist started to design an alternative algorithm to satisfy the purpose of Göbel et al and Pazos et al. In contrast to the previous papers this algorithm is based on a statistical analysis of relevant biophysical properties of the amino acids in the sequence alignments. The result shows good promise. However, since the statistical analysis assumes no general previous interdependences of the protein sequences - evolutionary relationships between the sequences may give rise to artefacts in the analysis. Specific plan and suitable previous knowledge: The aim of the project is to continue to develop the algorithm by Dr Bergkvist. A compensation factor is necessary to take into account evolutionary relationships between the protein sequences and alleviate corresponding artefacts in the results. This compensation factor should be incorporated into the code. The algorithm has been developed in Java so far, but a transfer to C may be advantageous. In any case, the more advanced statistical analysis probably needs to be developed in C. Besides familiarity with these programming languages, experiences with any of the following may be advantageous: protein sequence alignment algorithms, phylogenetic relationships, DNA replication and repair, incorporation of genetic mutations, and evolution.

References: Göbel et al., Proteins: Structure, Function and Genetics 18 (1994), 309-317 Pazos et al., Journal of Molecular Biology 271 (1997), 511-523

Keywords: Bioinformatics, Computer programming, Protein sequence alignments, Evolution, Phylogenetic relationships, Covariances, C, C++, Java, Statistics. =================================================================== Analysis of the correlation of NMR chemical shifts to protein dihedral angles Project background: It is well-known in the field of NMR (nuclear magnetic resonance), since quite some time ago, that chemical shift is related to the electromagnetic environment of the measured nucleus. For example is the chemical shift of a nucleus in a protein molecule affected by the relative positions of electronic orbitals of neighbouring atoms. This fact has been used to qualitatively determine the secondary structure type of small peptide chains in proteins with unknown molecular structure (see for example Wishart et al and Cornilescu et al). A more recent publication (Wang and Jardetsky) reports an extension of previous studies by presenting a quantification of the correlation using a statistical approach. Finding an accurate quantification of the correlation between the chemical shift and structure geometries have important implications for protein structure determination, and have a potential to extend current capabilities of NMR into analysis of protein folding and protein interactions. Specific plan and suitable previous knowledge: The student will analyze a database of chemical shifts and relate the chemical shifts to corresponding protein structures in other databases. The aim is to make a statistical quantification (analogously to Wang and Jardetsky) of the relationship between chemical shifts and associated peptide dihedral torsion angles. Using dihedral torsion angles rather than classifications of secondary structure elements is believed to improve the correlations (and thus structure predictions based on the chemical shift) significantly. Data may be collected from the databases by writing program scripts in for example Perl (or any other suitable programming language). Subsequent statistical analysis is probably best developed in C. Besides familiarity with these programming languages, experiences with any of the following may be advantageous: protein structures, NMR, database programming, and statistics. References: Wishart, Sykes and Richards, Biochemistry 31, 1647-1651 (1992) Cornilescu, Delaglio and Bax, J. Biomol. NMR 13, 289-302 (1999) Wang and Jardetsky, Protein Science 11, 852-861 (2002) Keywords: NMR chemical shifts, Protein secondary structure, Dihedral torsion angles, Statistics, Database, Computer programming, C, Perl. =================================================================== Molecular docking program and molecular structure visualisation on the web
The understanding of protein structures have developed rapidly since the late 1950's. Today about 20.000 protein structures are known and there are realistic projections that all existing protein folds will be represented in the database within the near future (see Berman et al). The vast information available in the PDB database is a treasure-trove for humanity and work is underway to translate this treasure to useful applications. One important current development is to predict and analyze interactions between proteins and their substrates (from small organic molecules, such as medical drugs, to much larger substrates such as other proteins). Many different strategies for molecular docking has been developed and even more computer programs are available that implement those strategies (Halperin et al). Specific plan and suitable previous knowledge: The great number of available strategies and computer programs makes it hard to perform systematic analyses and evaluations of docking predictions. The aim of this project is to create a web interface that would collect available implementations of molecular docking together with molecular structure databases and facilitate systematic analyses. The project is suitable for two students or more. One student probably needs previous knowledge about php, linux server administration and mysql. The other student probably needs previous knowledge about computer graphics and molecular visualization, web design, and protein structures. Overlapping experiences, as well as experience with molecular docking is obviously an advantage.

References: Berman, Goodsell and Bourne, American Scientist 90 (4), 350 - 359 (2002) Halperin et al, Proteins 47, 409-443 (2002)

Keywords: Molecular docking, protein structure, web design, computer graphics, php, mysql, rational drug design. =================================================================== Cloning and expression of the DNA repair and cell-cycle associated protein Mus81
Mapping protein-protein interaction networks has come in vogue recently thanks to advancement in experimental techniques, and thanks to the increasing wealth of genomic sequence and protein structure data made available. Protein-protein interactions are important because it is the basis for many of the processes in living cells (for example detection of external signals from the environment, DNA repair, synchronization of the cell-cycle, and many more). High through-put protein interaction techniques such as two-hybrid screening or mass spectrometry (see for example Ho et al) can facilitate identification of many protein interaction pairs, but it does not provide any information on the nature of the interaction; on how the proteins interact on a molecular level. On the other hand, nuclear magnetic resonance (NMR) provides a unique capability to identify the interaction interface between molecules on the level of atomic resolution. Specific plan and suitable previous knowledge: Mus81 is a protein recently identified as an important component and interaction hub in the DNA metabolism network (Haber and Heyer). The structure of the protein is known (Nishino et al), but its format of interaction to its various partners remains to be characterized thoroughly. The long-term goal of the project is to clone and express Mus81 and several of its interaction partners and characterize their interactions with NMR and bioinformatics techniques. In the first round a student would clone the mus81 gene, as well as separate domains of the protein, and express the corresponding proteins in E. coli. NMR techniques and biochemical assays would be used to validate protein quality and to characterize the products. Subsequent and/or parallel projects are readily available. Suitable previous experiences include gene cloning, protein purification, biochemical assays, and familiarity with NMR spectroscopy.

References: Haber and Heyer, Cell 107 (2001), 551-554 Ho et al., Nature 415 (2002), 180-183 Nishino et al, Structure 11 (2003), 445-457

Keywords: Protein interaction, NMR, gene cloning, protein purification, DNA metabolism, biochemistry. ===================================================================