Computer In Science: January 2011

Thursday, January 27, 2011

Excel

Linear regression is an approach to modeling the relationship between a scalar variable y and one or more variables denoted X. In linear regression, models of the unknown parameters are estimated from the data using linear functions. Such models are called linear models. Most commonly, linear regression refers to a model in which the conditional mean of y given the value of X is an affine function of X. Less commonly, linear regression could refer to a model in which the median, or some other quantile of the conditional distribution of y given X is expressed as a linear function of X. Like all forms of regression analysis, linear regression focuses on the conditional probability distribution of y given X, rather than on the joint probability distribution of y and X, which is the domain of multivariate analysis.

Quadratic regression models are often used in economics areas such as utility function , forecasting, cost-befit analysis, etc. This JavaScript provides parabola regression model. This site also presents useful information about the characteristics of the fitted quadratic function.

In order to solve problems involving quadratic regression, it is necessary to

know how to enter data into your graphing calculator for completing modeling problems
know how to solve quadratic equations
know how to calculate a quadratic equation that best fits a set of given data
write and solve an equation for the problem

The examples are as follows :

-Beer's Law Scatter Plot and Linear Regression-

-Titration of 50 ml of 0.1 M HCl with 0.1 M Na0H-

-Line Best Fit-

Quadratic Regression-

Tuesday, January 11, 2011

Smiles

SMILES (Simplified Molecular Input Line Entry System) is a line notation (a typographical method using printable characters) for entering and representing molecules and reactions.
SMILES contains the same information as might be found in an extended connection table. The primary reason SMILES is more useful than a connection table is that it is a linguistic construct, rather than a computer data structure. SMILES is a true language, albeit with a simple vocabulary (atom and bond symbols) and only a few grammar rules. SMILES representations of structure can in turn be used as "words" in the vocabulary of other languages designed for storage of chemical information (information about chemicals) and chemical intelligence (information about chemistry).

Part of the power of SMILES is that unique SMILES exist. With standard SMILES, the name of a molecule is synonymous with its structure; with unique SMILES, the name is universal. Anyone in the world who uses unique SMILES to name a molecule will choose the exact same name.

One other important property of SMILES is that it is quite compact compared to most other methods of representing structure. A typical SMILES will take 50% to 70% less space than an equivalent connection table, even binary connection tables. For example, a database of 23,137 structures, with an average of 20 atoms per structure, uses only 1.6 bytes per atom when represented with SMILES. In addition, ordinary compression of SMILES is extremely effective. The same database cited above was reduced to 27% of its original size by Ziv-Lempel compression (i.e. 0.42 bytes per atom).

These properties open many doors to the chemical information programmer. Examples of uses for SMILES are:

Keys for database access
Mechanism for researchers to exchange chemical information
Entry system for chemical data
Part of languages for artificial intelligence or expert systems in chemistry

Structural images of SMILES

Structural image	SMILES notation
	C=CC\C=C\O
	CCN(CC)CC
	CC(C)C(=O)O
	CC(C)C(CCC)C(CCC)C=C
	C1CCCCC1
	CC1=CC(Br)CCC1
	C1CN(CCC1)C2CCCCO2
	c1ccco1
	Oc1ccncn1
	c1ccccn1
	ON1CCCCC1
	O[n+]1ccccc1
	Oc1ccccc1
	Cn1cccc1
	C[C@H]=C\C=C\F
	c1cccn1
	Oc1ccccn1
	C\C=C\C=C\F

Wednesday, January 5, 2011

Protein Data Bank

The Protein Data Bank (PDB) is a repository for the 3-D structural data of large biological molecules, such as proteins and nucleic acids. The data, typically obtained by X-ray crystallography or NMR spectroscopy and submitted by biologists and biochemists from around the world, are freely accessible on the Internet via the websites of its member organisations (PDBe, PDBj, and RCSB). The PDB is overseen by an organization called the Worldwide Protein Data Bank, wwPDB.

The PDB is a key resource in areas of structural biology, such as structural genomics. Most major scientific journals, and some funding agencies, such as the NIH in the USA, now require scientists to submit their structure data to the PDB. If the contents of the PDB are thought of as primary data, then there are hundreds of derived (i.e., secondary) databases that categorize the data differently. For example, both SCOP and CATH categorize structures according to type of structure and assumed evolutionary relations; GO categorize structures based on genes.^[1]

^{Pictures shown below are the examples of protein structure.}

Subtilisin

Subtilisin (serine endopeptidase) is a non-specific protease (a protein-digesting enzyme) initially obtained from Bacillus subtilis.

Subtilisins belong to subtilases, a group of serine proteases that initiate the nucleophilic attack on the peptide (amide) bond through a serine residue at the active site. They are physically and chemically well-characterized enzymes. Subtilisins typically have molecular weights of about 20,000 to 45,000 dalton. They can be obtained from soil bacteria, for example, Bacillus amyloliquefaciens. Subtilisins are secreted in large amounts from many Bacillus species.

Prolyl Aminopeptidase

Xaa-Pro aminopeptidase 1 is an enzyme that in humans is encoded by the XPNPEP1 gene.^[1]
X-prolyl aminopeptidase (EC 3.4.11.9) is a proline-specific metalloaminopeptidase that specifically catalyzes the removal of any unsubstituted N-terminal amino acid that is adjacent to a penultimate proline residue. Because of its specificity toward proline, it has been suggested that X-prolyl aminopeptidase is important in the maturation and degradation of peptide hormones, neuropeptides, and tachykinins, as well as in the digestion of otherwise resistant dietary protein fragments, thereby complementing the pancreatic peptidases. Deficiency of X-prolyl aminopeptidase results in excretion of large amounts of imino-oligopeptides in urine (Blau et al., 1988).[supplied by OMIM]^[1]

Lex A Repressor

Repressor LexA or LexA is a repressor enzyme (EC 3.4.21.88) that represses SOS response genes coding for DNA polymerases required for repairing DNA damage. LexA is intimately linked to RecA in the biochemical cycle of DNA damage and repair. RecA binds to DNA-bound LexA causing LexA to cleave itself in a process called autoproteolysis.
DNA damage can be inflicted by the action of antibiotics. Bacteria require topoisomerases such as DNA gyrase or topoisomerase IV for DNA replication. Antibiotics such as ciprofloxacin are able to prevent the action of these molecules by attaching themselves to the gyrase - DNA complex. This is counteracted by the polymerase repair molecules from the SOS response. Unfortunately the action is partly counterproductive because ciprofloxacin is also involved in the synthetic pathway to RecA type molecules which means that the bacteria responds to an antibiotic by starting to produce more repair proteins. These repair proteins can lead to eventual benevolent mutations which can render the bacteria resistant to ciprofloxacin.