Tuesday, January 25, 2011

EXCEL

TODay's assignment

LINEAR REGRESSION







In statistics, linear regression is an approach to modeling the relationship between a scalar variable y and one or more variables denoted X.
In linear regression, models of the unknown parameters are estimated from the data using linear functions. Such models are called linear models
Most commonly, linear regression refers to a model in which the conditional mean of y given the value of X is an affine function of X
Less commonly, linear regression could refer to a model in which the median, or some other quantile of the conditional distribution of y given X is expressed as a linear function of X.
Like all forms of regression analysis, linear regression focuses on the conditional probability distribution of yX, rather than on the joint probability distribution of y and X, which is the domain of multivariate analysis. 















Linear regression was the first type of regression analysis to be studied rigorously, and to be used extensively in practical applications.
This is because models which depend linearly on their unknown parameters are easier to fit than models which are non-linearly related to their parameters and because the statistical properties of the resulting estimators are easier to determine.



Linear regression has many practical uses. Most applications of linear regression fall into one of the following two broad categories:
  • If the goal is prediction, or forecasting, linear regression can be used to fit a predictive model to an observed data set of y and X values. After developing such a model, if an additional value of X is then given without its accompanying value of y, the fitted model can be used to make a prediction of the value of y.
  • Given a variable y and a number of variables X1, ..., Xp that may be related to y, then linear regression analysis can be applied to quantify the strength of the relationship between y and the Xj, to assess which Xj may have no relationship with y at all, and to identify which subsets of the Xj contain redundant information about y, thus once one of them is known, the others are no longer informative.






















quadratic regression





Quadratic Regression is a process by which the equation of a parabola of "best fit" is found for a set of data


is a form of linear regression in which the relationship between the independent variable x and the dependent variable y is modeled as an nth order polynomial. Polynomial regression fits a nonlinear relationship between the value of x and the corresponding conditional mean of y, denoted E(y|x), and has been used to describe nonlinear phenomena such as the growth rate of tissues, the distribution of carbon isotopes in lake sediments , and the progression of disease epidemics.
Although polynomial regression fits a nonlinear model to the data, as a statistical estimation problem it is linear, in the sense that the regression function E(y|x) is linear in the unknown parameters that are estimated from the data.
For this reason, polynomial regression is considered to be a special case of multiple linear regression



















alldone for part one of KOS 1110.
THANKS MADAM LINDA

alhamdullilah . . . 

Tuesday, January 11, 2011

SMILE

The simplified molecular input line entry specification or SMILES is a specification for unambiguously describing the structure of chemical molecules using short ASCII strings. 


SMILES strings can be imported by most molecule editors for conversion back into two-dimensional drawings or three-dimensional models of the molecules.


The original SMILES specification was developed by Arthur Weininger and David Weininger in the late 1980s. It has since been modified and extended by others, most notably by Daylight Chemical Information 

Systems Inc. In 2007, an open standard called "OpenSMILES" was developed by the Blue Obelisk open-source chemistry community. 


Other 'linear' notations include the Wiswesser Line Notation (WLN), ROSDAL and SLN (Tripos Inc).



















our tutorial today:








































Tuesday, January 4, 2011

PROTEIN DATA BANK

PROTEIN DATA BANK


The Protein Data Bank (PDB) is a repository for the 3-D structural data of large biological molecules, such as proteins and nucleic acids. . 


The data, typically obtained by X-ray crystallography or NMR spectroscopy and submitted by biologists and biochemists from around the world, are freely accessible on the Internet via the websites of its member organisations (PDBe, PDBj, and RCSB). 


The PDB is overseen by an organization called the Worldwide Protein Data Bank, wwPDB.




The PDB is a key resource in areas of structural biology, such as structural genomics. 
 
Most major scientific journals, and some funding agencies, such as the NIH in the USA, now require scientists to submit their structure data to the PDB.
 
 
 
 
 
 
 
GENE ONTOLOGY
 
IS A major bio informatics initiative with the aim of standardizing the representation of gene and gene product attributes across species and databases. The project provides a controlled vocabulary of terms for describing gene product characteristics and gene product annotation data from GO Consortium members,


The Gene Ontology (GO) 
  • project is a collaborative effort to address the need for consistent descriptions of gene products in different databases. 

  • The project began as a collaboration between three model organism databases, FlyBase (Drosophila), the Saccharomyces Genome Database (SGD) and the Mouse Genome Database (MGD), in 1998. 

  • Since then, the GO Consortium has grown to include many databases, including several of the world's major repositories for plant, animal and microbial genomes. 



THE ONTOLOGIES
  1. CELLULAR COMPONENT
  2. BIOLOGICAL PROCESS
  3. MOLECULAR FUNCTION



KEGG PATHWAY is a collection of manually drawn pathway maps (see new maps, change history, and last updates) representing our knowledge on the molecular interaction and reaction networks for:


0. Global Map
1. Metabolism
    Carbohydrate   Energy   Lipid   Nucleotide   Amino acid   Other amino acid   Glycan
    Cofactor/vitamin   Terpenoid/PK   Other secondary metabolite   Xenobiotics  
2. Genetic Information Processing
3. Environmental Information Processing
4. Cellular Processes
5. Organismal Systems
6. Human Diseases




7. Drug Development





 
ENZYME commision
The Enzyme Commission number (EC number) is a numerical classification scheme for enzymes, based on the chemical reactions they catalyze.           As a system of enzyme nomenclature, every EC number is associated with a recommended name for the respective enzyme.

Strictly speaking, EC numbers do not specify enzymes, but enzyme-catalyzed reactions.

If different enzymes (for instance from different organisms) catalyze the same reaction, then they receive the same EC number. 

By contrast, UniProt identifiers uniquely specify a protein by its amino acid sequence.

The enzyme nomenclature scheme was developed starting in 1955, when the International Congress of Biochemistry in Brussels set up an Enzyme Commission.
The first version was published in 1961.
The current sixth edition, published by the International Union of Biochemistry and Molecular Biology in 1992, contains 3196 different enzymes.








FtSH PEPTIDASE..



author is Han,s.
release date :2010-12-22


experiment 
X-RAY DIFFRACTION with resolution of 2.00 Å 

molecule
Penicillin-binding protein 3

polymer
1

chain
A

length
538

Structural basis for effectiveness of siderophore-conjugated monocarbams against clinically relevant strains of Pseudomonas aeruginosa. 

                                                            

                                                         EXPLANATION
Pseudomonas aeruginosa is an opportunistic Gram-negative pathogen that causes nosocomial infections for which there are limited treatment options.
Penicillin-binding protein PBP3, a key therapeutic target, is an essential enzyme responsible for the final steps of peptidoglycan synthesis and is covalently inactivated by ?-lactam antibiotics. 
Here we disclose the first high resolution cocrystal structures of the P. aeruginosa PBP3 with both novel and marketed ?-lactams.
These structures reveal a conformational rearrangement of Tyr532 and Phe533 and a ligand-induced conformational change of Tyr409 and Arg489.
The well-known affinity of the monobactam aztreonam for P. aeruginosa PBP3 is due to a distinct hydrophobic aromatic wall composed of Tyr503, Tyr532, and Phe533 interacting with the gem-dimethyl group. 
The structure of MC-1, a new siderophore-conjugated monocarbam complexed with PBP3 provides molecular insights for lead optimization.  
Importantly, we have identified a novel conformation that is distinct to the high-molecular-weight class B PBP subfamily, which is identifiable by common features such as a hydrophobic aromatic wall formed by Tyr503, Tyr532, and Phe533 and the structural flexibility of Tyr409 flanked by two glycine residues. 
This is also the first example of a siderophore-conjugated triazolone-linked monocarbam complexed with any PBP. 
Energetic analysis of tightly and loosely held computed hydration sites indicates protein desolvation effects contribute significantly to PBP3 binding, and analysis of hydration site energies allows rank ordering of the second-order acylation rate constants.  
Taken together, these structural, biochemical, and computational studies provide a molecular basis for recognition of P. aeruginosa PBP3 and open avenues for future design of inhibitors of this class of PBPs.
THERMOLYSIN..



author :
Steuber, H.,   Englert, L.,   Silber, K.,   Heine, A.,   Klebe, G.
 
release date
2009-12-08

length
316

chain
A


polymer
1

experiment
X-RAY DIFFRACTION with resolution of 1.75 Å

                                                         EXPLANATION
Fragment-based drug discovery has gained a foothold in today's lead identification processes. We present the application of in silico fragment-based screening for the discovery of novel lead compounds for the metalloendoproteinase thermolysin. We have chosen thermolysin to validate our screening approach as it is a well-studied enzyme and serves as a model system for other proteases. A protein-targeted virtual library was designed and screening was carried out using the program AutoDock. Two fragment hits could be identified. For one of them, the crystal structure in complex with thermolysin is presented. This compound was selected for structure-based optimization of binding affinity and improvement of ligand efficiency, while concomitantly keeping the fragment-like properties of the initial hit. Redesigning the zinc coordination group revealed a novel class of fragments possessing K(i) values as low as 128 microM, thus they provide a good starting point for further hit evolution in a tailored lead design.




LEUCYL AMINOPEPTIDASE





author
 
release date
2005-09-27

 experiment
X-RAY DIFFRACTION with resolution of 2.10 Å

compound
polymer & ligands

length
299

chain
A

 



EXPLANATION

Aminopeptidases specifically cleave the amino-terminal residue from polypeptide chains and are involved in the metabolism of biologically active peptides. 
The family includes zinc-dependent enzymes possessing either one or two zinc ions per active site. Structural studies providing a detailed view of the metal environment may reveal whether the one-zinc and two-zinc enzymes constitute structurally and mechanistically distinct subclasses, and what role the metal ions play in the catalytic process.
We have solved the crystal structure of the monomeric aminopeptidase from Aeromonas proteolytica at 1.8 A resolution. 
The protein is folded into a single alpha/beta globular domain. 
The active site contains two zinc ions (3.5 A apart) with shared ligands and symmetrical coordination spheres. We have compared it with the related bovine lens leucine aminopeptidase and the cobalt-containing Escherichia coli methionine aminopeptidase.


TO know more about PDB


JUST CLICK