A Brief Primer on Proteins for Bioinformatics Non-Biologists

You might recall, around the year 2000, a Grand Breakthrough in biology: the complete sequencing of human DNA.   People hoped that proteins would be next.  But now, about 20 years later, we're not at that level of understanding when it comes to PROTEINS rather than DNA...  Why? Because proteins are devilishly complex . If you're managing a database (relational or semantic) featuring proteins, you might think that the entities (records) of your database are simply "proteins", and that you're just going to need a number of fields (attributes) to describe those records...  Right?  Wrong! A little basic biology brush-up for starters...  Recall that DNA is a sequence of 4 "letters" ( nucleotides .)  Proteins are sequences of 20 "letters" (amino acids. )  So, why the immense complexity of proteins? There are 20 200  possible amino-acid sequences for a 200-residue protein, of which the natural evolutionary process has sampled only an infinites