Bioinformatics

From Wiki - Scioly.org
Jump to navigation Jump to search


Bioinformatics is a Division C trial event that is set to run in the 2022 season at the National Invitational as well as the BirdSO Invitational.

Biological Prerequisites

Central Dogma of Life

Every individual has a unique sequence of nucleobases. The distinct sequence is what leads to the diversity of life. DNA makes RNA, and RNA makes protein. DNA is stored inside of a cell's nucleus, and can't move. RNA transmits the information in DNA to make proteins in the cell. Proteins are what essentially every part of our body is made of.

Nucleic Acids and Proteins

There are five nucleobases: Adenine, Thymine (DNA only), Guanine, Cytosine, and Uracil (RNA only). Adenine always pairs with thymine in DNA and uracil in RNA, while guanine is always paired with cytosine in both DNA and RNA.

Types of Bioinformatic Databases

Sequence Alignment

Sequence alignment is arranging DNA, RNA, or proteins to identify regions of similarity. Gaps can be inserted between them so identical or similar characters are aligned in successful columns. Typically there are points assigned for mismatched or blanks, and either the highest or lowest score depending on the algorithm would be the best alignment. Dynamic programming is often utilized to solve these alignments as quickly as possible.

Biological Motivation

Sequence alignments are modeled after mutations where genes can be inserted/deleted. By trying to solve for these alignments, it helps assist with mutations or mistakes that might be fatal.

Scoring Matrices

Pairwise Sequence Alignment

This is comparing only two sequences together.

Needleman-Wunsch Algorithm

Sometimes this algorithm is referred to as the optimal matching or the global alignment technique because it checks the entire sequence rather than a part of it.

Smith-Waterman Algorithm

This algorithm focuses on local sequence alignment rather than globally, where it checks for similar regions between two sequences. The negative scoring matrix cells are set to zero, resulting in only positive scores. However, it can't be practically applied to large-scale problems due to poor efficiency (quadratic complexity in time and space).

Multiple Sequence Alignment

This is comparing more than two sequences, possibly a whole database of sequences with one another.

Assessing Sequence Alignments

Protein Structure Analysis

Structural Alignment Methods

Methods for Secondary Structure Prediction

Methods for Three-Dimensional Structure Prediction

Computational Protein Design

Docking Algorithms

Rosetta

Resources

http://rosalind.info/problems/locations/