You are on page 1of 99




User (Autonomous Agent) request(s)/application selections [Morale/Cohesion 3 part formatright-side] 2. Feasibility study [Goals/Objectives 4 part format-right-side] 3. Investigation [Goals/Objectives 3 part format-left-side] 4. Analysis [Norms/Standards 5 part format-left-side] 5. Systems design [Goals/Objectives 4 part format-right-side] 6. Programming [Morale/Cohesion 5 part format-left-side] 7. Systems testing [Power/Authority 3 part format-right-side] 8. Documentation [Norms/Standards 3 part format-left-side] 9. Conversion and implementation [Goals/Objectives 3 part format-right-side] 10. Maintenance [Goals/Objectives 4 part format-left-side] 11. Evaluation [Norms/Standards 3 part format-left-side] 1. 2. 3. 4. 1. 2. 3. 1. 2. 3. 4. 5. Project initiation (Hardware/Software) Power/Authority Project development (The Project) Norms/Standards Project implementation (The User Climate/Autonomous Agent Conditions of Configuration) Goals/Objectives Post project evaluation (The Systems Analysts/Autonomous Agent Activities) Morale/Cohesion Input subsystems [3 part Norms/Standards] Computer subsystems[3 part Norms/Standards] Output subsystems[3 part Norms/Standards] Method Phase-One [5 part Goals/Objectives (The Dictionary of Occupational Titles)] Method Phase-Two [5 part Goals/Objectives (The Dictionary of Occupational Titles)] Method Phase-Three [5 part Goals/Objectives (The Dictionary of Occupational Titles)] Method Phase-Four [5 part Goals/Objectives (The Dictionary of Occupational Titles)] Method Phase-Five [5 part Goals/Objectives(The Dictionary of Occupational Titles)]

Taxonomy Table

This it the largest unit of classification. Initially it was thought that there were only two kingdoms, plants and animals. Eventually microscope and other tools helped clarify the existence of other organisms. Now, there are a total of 5 kingdoms. Animalia - the largest with over 1 million named species, fish, humans; Plantae - 350,000 species, trees, grass; Fungi 100,000 species, mushrooms, lichen; Protista - 100,000 species, green, golden, brown, and red algae, flagellates; Monera - 10,000 species, blue-green algae or cyanobacteria.

One of the most interesting fields of interest in the study of Biology is taxonomy. Although there are other fields out there such as ecology and embryology, taxonomy is easy to comprehend, restricted to a small set of structural information, and is good to know as reference. Taxonomy, also called systematics, is the study of the classification of all living organisms. The current method of taxonomy was started by Carlous Linnaeus which features organisms arranged into groups within groups within groups, on and on until an organism is defined within it's own species or individual group. This orderly classification helps scientists in a number of ways. One is that it keeps them clearly in sync with other scientists because of the existence of a universal system. It also helps scientists in identifying evolutionary links between certain species.

The next most specific unit of classification. This further divides the kingdom into 20 or so divisions based on very distinct and defining characteristics. For example, within the Animal Kingdom, a major division is the chordates that are animals with notochords. This includes humans, fish, mammals, etc. Flowering plants are defined into the antrophyta division of the Plant Kingdom.

How it works:
Originally, when Linnaues founded taxonomy, organisms were divided based on sole visible physical characteristics. Now they're separated based on any unique and defining features mainly external physical features and secondarily based on other features such as feeding habits. Each organism is based on binomial nomenclature. This is in which an organism has two words to it's name. The first name is the genus and the second name is the specie. For example, humans are scientifically called Sapiens - genus Homo, species Sapiens. The words that make up the names for the individual groups of taxonomy are based on the Greek or Latin language. This makes for a universal language throughout the world. Otherwise an English scientist mentioning a "cat" to a Chinese person would be misunderstood because of language differences. There are international commissions out there that help filter and record an updated listing of the classifications. Some names are based on the equivalent characteristics of the organism in Latin, or

This further classifies the organism. It separates them into categories that make them very similar in terms of certain basic features. For example the class mammalia includes all animals that breast-feed, which includes humans, cows, dolphins, etc. Another class would be reptilia which includes cold-blooded and scaled animals.

Organisms of the same order are more similar that that of the same class. A lot of obvious evolutionary connections can be drawn from looking at the order; only a few features separate the organisms as a breaking in the evolutionary chain. One example is that within the class Mammalia, carnivores are separated into the order Carnivora while Insecteaters are separated into the order Insectivora.

they could have no meaning at all and are just named after their founder.

The Origins of Taxonomy:
Classification has been around on earth ever since people paid attention to organisms. One primeval system that was developed was based on "harmful" and "non-harmful" organisms. Then, the beloved Aristotle was the first to form a useful system of classification during the 300s BC. His was first based on whether the organism had red blood or didn't have red blood. Then he subdivided organisms such as plants by physical characteristics such as size and features. This system is somewhat crude by today's standards, yet it lasted over 2,000 years. Eventually, as communication improved and science had advanced to a reasonable point, modern classification started to develop. The most popular founder was the Swedish naturalist Carolus Linnaeus in the 1700s. He developed the system by which organisms are classified based on the unique characteristics that they had. He also invented the binomial nomenclature for naming. Linnaeus agreed with scientists that his work was somewhat crude, but it's purpose and general concepts were continually applied. Over time, as evolutionary studies were extrapolated, the classification system has become more advanced showing different groups and links. And as time goes on, classifications continue to change and are evergrowing.

Even more specific, the animals within this share a very close similarity between each other. Most will probably have the same behavior patterns, feeding habits, and general functions. An example is the Cat Family (Felidaes) which all have whiskers, sharp claws, and include animals such as Lions and Cats.

This is the part that makes up the first word of the binomial nomenclature of an organism. All the organisms within their genus may look very similar to each other. And although it is at most times not healthy, organisms of the same genus may breed with each other.

The most specific unit of classification is the species. The species makes up all the organisms and their apparent ancestors and descendants. Members of the species are much similar to their parents and can freely breed with other members of the same species without much complication.

The draft sequence of the human genome has been integrated into many existing resources to facilitate biological discovery. The map below represents the interconnections between different types of public biological data available at NCBI.

Cellular Chemistry

Hold on to your seat! This document attempts to cover the essentials of several chemistry courses- general chemistry, organic chemistry, and biochemistry- but just what a beginning biology student needs to know to survive cell biology, anatomy, physiology, microbiology, and related biology courses. This document assumes you know NO chemistry. If that is the case- it is normal to feel a bit overwhelmed as you study this material, but have courage- many students have studied and survived this material, and succeeded in their biology studies. You can too! All organisms are made of cells, but cells are made of organelles and other subcellular components, that are made of molecules- orderly arrangements of atoms, or elements. Atoms are so small that only 12 grams of carbon, such a small piece of charcoal, contains the amazing quantity of 602,300,000,000,000,000,000,000 atoms!. So imagine how small a single atom is! There are many atoms or elements that exist, such as sodium, oxygen, copper, gold, and carbon. Though atoms differ in their physical properties, all atoms share similarity in their structure in that they are really all made of just three varieties of subatomic particles.

The Atomic Structure
The atomic structure is such that an atom has a central region, that is a nucleus, composed of protons and neutrons, and orbiting electrons.

Protons (symbolized as p) have a mass of 1 atomic mass unit (AMU) and an atomic electrical charge of +1. Neutrons (symbolized as n) have a mass of 1 atomic mass unit (AMU) and have no electrical charge (they are neutral). Electrons (e-) orbit the nucleus at various distances, or shell levels. These minute particles, traveling at the speed of light, have a mass of almost zero (about 0.008 atomic mass units [AMUs]), and they have an atomic electrical charge of -1. Normally their number equals that of the number of protons in the nucleus; in this way, the atom remains electrically neutral.

Calculating the Structure of an Atom or Molecule
The weight, that is mass of an atom or molecule, as well as the net electrical charge can be determined if the atom or molecules composition of atomic particles is known. The reverse is also true. Useful formulas for performing such calculations include the following: (where p, n, and e are symbolic for proton, neutron, and electron, and # is symbolic for 'the number of.') Net Atomic Mass=(#p + #n) Net Atomic Charge=(#p - #e)

[Sample atom with mass of 7 and net charge of 0] Example: an atom with 5 neutrons, 3 protons, and 7 orbiting electrons would have a net atomic mass of 8 (=5+3) and a net atomic electrical charge of -4 (=3-7). Example: How many protons and neutrons are there in an atom with a mass of 23 and 12 orbiting electrons if you know that the atom has a net charge of +3? Solution: Since the charge if +3, then there are 3 more protons than there are electrons (12), so there must be 15 protons. The number of neutrons is 23-15=8.

The Hydrogen Atom
H atom and H+ ion Hydrogen atoms are the simplest of all atoms, having a nucleus with a single proton and a single orbiting electron. The mass of the H atom is 1.008, with the electron contributing only 0.008 atomic mass units. If the electron is lost from the H atom, then a lone proton, p, remains, and is positively charged. The resulting particle is a hydrogen ion, electrically charged because the lone proton is not countered by any electron negativity. The hydrogen ion is symbolized as H+. Hydrogen ions are very important biologically because they are small and electrically charged, and can cause havoc to protein structure and cell function; this is particularly critical when H+ ions interact with enzyme proteins, critical for cell metabolism.

The scale used to measure the concentration of H+ ions in a solution (blood, cytoplasm, etc.) is the pH scale. The pH scale runs from 0 to 14, with 7 neutral, 0 to 6.999 acidic, and 7.001 to 14 alkaline or basic. |pH2-------pH4------------pH7------------pH11-------pH14 |(acid pH) ------------(neutral)-------------(basic pH)|

Acids, that is molecules that release H+ ions, lower pH, and a low pH implies high concentrations of H+ ions. Bases, that is molecules that capture H+ ions, raise pH, and a high pH implies low concentrations of H+ ions. Water has neutral pH. Blood has a pH of 7.35. Vinegar has a pH of about 4. Concentrated sulfuric acid has pH of about 1. Stomach acid has a pH of about 2. Toilet bowel cleaner, or lye, creates an extremely alkaline (basic) pH when added to a solution, resulting in a pH of about 12-14. Cell cytoplasm typically has a slightly acidic pH. The pH scale is a log scale, based on powers of 10, so that a pH 6 solution has ten times the acidity as a pH 7 solution, and a pH 5 solution has ten times the acidity as a pH 6 solution. A pH 5 solution has one-hundred times the acidity as a pH 7 solution. Note that low pH implies high levels of H+, and that high pH implies low levels of H+ (most beginning students confuse this, so make a mental note of the reverse nature of the pH scale).

Empty Space
There is a lot of empty space between an atomic nucleus and orbiting electrons, and there is a lot of empty space between each e-. Physicists have determined that if all the empty space were removed from all the atoms of all the people of the planet earth, the entire earth's population could be condensed into a container smaller than the size of a thimble! And a single human being such as yourself could in theory be shrunk to the size of a single hydrogen atom. In fact, protons and neutrons are themselves made of smaller worlds in themselves, made up of quarks.

Quarks are what actually comprise protons and neutrons. There are a variety of quarks, including the strawberry quark, the chocolate quark, and the vanilla quark (no kidding!). They don't really taste like chocolate, but the scientists that discovered them got a little giddy one night at the lab and decided to make scientific naming of atomic particles a bit more fun for everyone!

What holds all these subatomic particles together? We do not know exactly, but there is one possible answer. Gravitons are theoretical particles believed to exist in the nucleus, causing protons and neutrons to attract all other p and n, hence the attraction of all matter for all other matter (the reason your feet stay attracted to the ground and you do not fly off into outer space, and the reason the moon orbits the earth).


Molecules are combinations of atoms, held together as a "team" by various forces called molecular bonds. Like a chain gang, if one atom in a molecule moves in one direction, the others are obliged to follow; though separate atoms, together they form a molecule. The molecule illustrated above in 3D is acetic acid (common vinegar acid)- the red spheres represent oxygen atoms, black carbon atoms, and white hydrogen atoms. Bond types that hold atoms or molecules together, or in close proximity, include (in order of strongest to weakest): covalent, ionic, hydrogen, and Van der Waals Forces.

Covalent Molecular Bonds

This type of bonding occurs when two atoms share their orbiting electrons, somewhat like if two children were to stand inside two hula-hoops (each hoop being an orbiting electron) and spin the hoops around themselves. Neither child can leave the spinning pair of hoops (electrons) that keep them in proximity to each other. Covalent bonds are strong, and each covalent bond, that is each pair of shared orbiting electrons between atoms, is symbolized by a straight line drawn between the atoms. Sometimes two pairs of electrons, that is 4 e-, are shared between two atoms; then a double covalent bond occurs and this is symbolized by a double line (===). Three or four pairs of electrons can be shared, and that results in triple and quadruple covalent bonds, symbolized guessed it, 3 or 4 lines drawn between the atoms, respectively.

For example, consider a molecule composed of 2 atoms of hydrogen and 1 atom of oxygen. A water molecule can be written as H20, or drawn as H-O-H. Look at.... no, interact with, the water molecule below! Hold down your left mouse button on the water molecule and you can rotate and view it in 3D space! [click here to do this with more molecules] Consider a molecule similar to water, hydrogen sulfide (rotten egg gas- stinky!) composed of 2 atoms of hydrogen and 1 atom of sulfur. A molecule of hydrogen sulfide gas can be written as H2S, or drawn as H-S-H. All of the structures below represent hydrogen sulfide.


Ionic Molecular Bonding

Hydrogen Molecular Bonding
Hydrogen bonds are attractions between hydrogen atoms and one or more of the following atoms: O, N, S, P, Cl, F. The six atoms just listed can be thought of as electron 'thieves,' stealing the majority of an electron's orbit time from the covalent bond that O, N, S, P, Cl, or F are a part of; as a consequence, the electron 'thief' atom takes on a partial negative charge density. Hydrogen atoms, in stark contrast, are very weak at maintaining their electron in orbit about the hydrogen proton nucleus; an hydrogen atom's electron can be stolen away most of the time by other O, N, S, P, Cl, F atoms, causing the hydrogen atom to take on a partial positive charge density (caused by the proton with the orbiting electron being absent from the covalent bond most of the time). The result is a partial negative charge density about O, N, S, P, Cl, or F attracting nearby partial positive charge densities about H. Voila! A hydrogen bond. It is hydrogen bonds that cause water molecules to have such strong attractions to each other, making for the high heating temperature needed to cause water molecules to escape from a water solution as steam.

Van Der Waal Forces

These are weak attractions between carbon atoms. Alone, each force is weak, but when stacked they become strong, much like lining up several batteries in series to create a series current (such as in a flashlight). Van der Waal Forces are significant in a cell's DNA genetic code, where the coiled DNA molecules have their carbon atoms stacked. In this way the Van der Waal forces help hold DNA together in its helical coil arrangement. Ionic Molecular Bonding This occurs when there are electrical attractions between electrically charged atoms or molecules, that is between ions. Ions are atoms or molecules where the number of protons does not equal the number of orbiting electrons. This creates an electrical imbalance, so that the atom is now an ion, having either a net positive charge (cation), or a net negative charge (anion). Ionic bonding, also known as a salt bond, occurs when a cation (positively charged atom or molecule) is electrically attracted to an anion (negatively charged atom or molecule). Table salt, sodium chloride or Na+Cl-, is a common example of a molecule held together by an ionic bond. Often the anionic atom species has stolen an electron from the cation atom species, creating the charged ions. Anion(-) :::::: (+)Cation Ions are atoms or molecules that have an inequality in terms of the

number of protons and electrons. The cathode of a battery attracts cations, because the cathode is negatively charged. The anode of a battery attracts anions because it is negatively charged. Don't confuse a cathode with a cation- they have opposite electrical charges and so attract each other. Likewise with an anode and anions.

Salts are combinations of cations and anions, such as ordinary table salt, Na+Cl-, but the term salt can applied to any combination of cation and anion, including complex and large molecules, such as Tetracycline Hydrochloride (tetracycline H+Cl-), where the tetracycline is ionized to form a cation, but is kept stable in solution by combining with a chloride anion (Cl-). Important Atoms, Ions, and Small Molecules studied in biology include: (memorize this list!)

• •
• • • • •

• • • • • • •

H Hydrogen atom H+ Hydrogen ion (pH is a measure of H+ in a solution) C Carbon atom (present in almost all cell molecules) Oxygen atom Na Sodium atom Na+ Sodium ion (vital for cell membrane excitability) P Phosphorous atom (don't confuse this with Potassium!) K Potassium atom K+ Potassium ion (vital for cell membrane excitability) Cl Chlorine atom Cl- Chloride ion S Sulfur atom (present in many proteins) N Nitrogen atom (critical for amino acids and proteins) Ca++ Calcium ion (bone, cell excitability, and hormone regulation)

• • • •
• •

Mn++ Manganese ion (stabilized cell enzymes) Mg++ Magnesium ion (stabilized cell enzymes) CO2 Carbon dioxide gas O2 Oxygen gas HCO3- Bicarbonate anion Zn++ Zinc ion Zn Zinc metal

Symbolic representations of atoms and bonds are commonly seen, or used, when observing or drawing chemical structures. When you understand the secrets that wizards use who draw molecules, you too will easily understand how to decipher molecular representations! So here are a few rules to commit to memory: 1. Remember that carbon atoms almost always form 4 covalent bonds, so each carbon atom in a molecule should have 4 bonds associated with it. Look at the 3D molecule of methan below- can you see the carbon atom (black) and the hydrogen atoms (white)? The carbon atom has formed 4 covalent bonds, one with each hydrogen atom (note that hydrogen atoms form only 1 covalent bond with whatever they bond with).

2. If you see a molecular drawing where a carbon has less than 4 covalent

bonds, the remaining "unseen" bonds are always hydrogen atoms bonded to the carbon atom; they are not usually drawn so that wizards can draw molecules faster. Look at the wizardry representations of a common organic molecules, benzene.

3. When you see a straight line extending off a carbohydrate molecule (sugar,
starch) into space, with no atoms at the end of the line, it is a wizard's trick (those sneaky wizards): wizards know that at the end of that line there is always an oxygen and then a hydrogen atom, this pair otherwise known as a hydroxyl group (-O-H, or -OH).

4. When you see molecular bonds drawn with angular bends in them, there is always a carbon atom at the bend or angle, even though the wizards do not draw it and so it looks like nothing is there; but now you know better!

Common Biological Molecules
The most common biological molecules include:

Carbohydrates: Always have an atomic ratio of 1C:2H:1O, that is 1 carbon for every oxygen and twice as many hydrogen atoms as either carbon or oxygen atoms. o Sugars- glucose, fructose, sucrose, and so on. Important for energy and for building genes. o Starches- animal starch (glycogen) and plant starch (cellulose). Starches are simply multiple sugars bonded together with various branching patterns between and among the bonding between the sugars.

N-containing Molecules Amino acids- the building blocks of proteins. The amino acid shown below is leucine, one of 3 amino acids known as the branched chain amino acids, natural anabolic nutrients that help build muscle mass and other tissues.


Peptides- small proteins; sometimes the term peptide is used in place of protein. A short peptide is illustrated below in 3D (some hydrogen atoms are hidden from view).


Proteins- enzymes, muscle protein, collagen skin protein, and so on. Urea, Ammonia- waste products of amino acid and protein metabolism.


Lipids: substances that are not readily soluble (mixable) in water. The molecule Benzene is illustrated below- it is a ring of 6 carbon atoms (black) with 6 attached hydrogen atoms (white); benzene is a common solvent used in organic and biochemistry for synthesizing other molecules, and in industry for cleaning. It is symbolized below as both a 3D model and a line drawing. Can you use your knowledge of wizardy (see above) to spot the carbon and hydrogen atoms in the line drawing? (line

drawings are common because they can be drawn quickly)

Compare the above representations of benzene then interact with benzene Hold your left mouse on the 3D benzene molecule and you can rotate and view it in 3D space! Benzene C6H6 gasoline, oils, grease o Fatty acids- found in food oils; calorie source, as well as important in cell membranes.


 acid is illustrated below.

dietary fatty acids in foods. A fatty

Prostaglandins- small lipids, actually fatty acids, that also act as chemical messengers. Prostaglandin E (PGE) is illustrated below using line drawing notation (can you spot the 20 carbon atoms using the chemistry

wizardry rules?)

o Triglycerides- common fat calorie storage molecules, made of 3 fatty acids linked together. o Steroid hormones (estrogen, testosterone). Steroid hormones- are complex cyclic lipids used as chemical messengers that travel in the blood to target cells. o Cholesterol (used to make steroid hormones)

Organic Acidsabundant during cell metabolism of sugars and fats. Acetic acid is illustrated below- it is formed during aerobic metabolism of carbohydrates or lipids within cells.

These biological molecules include the sugars and starches. They always contain a great deal of O, H and C, with a ratio of [C(H2O)]n, that is 1C:2H:1O Carbohydrates are important biologically as nutrients, structural components, and as antigens. Incidentally, the little n subscript is like an algebraic variable- it refers to an unspecified number of multiples of the molecule to which it is referring, in this case a molecule containing some multiple of C, H, and O in a specified ratio of 1:2:1. Sugars combine to form disaccharides (two sugar molecules linked together such as glucose + fructose forming sucrose cane sugar), polysaccharides (simple chains of sugars), and then starch (chains of sugars with complex branching patterns). The most common biological sugar is glucose, a six carbon sugar. Naturally occurring sugars are what chemists call right-handed, or D sugars, as in D-glucose, Dgalactose, D-fructose. Sugars can also be left-handed, or L sugars. D and L refer to

whether the molecules bend light in a special instrument to the right or left, respectively, that is, whether the molecules are dextro-rotatory or levo-rototory. Ribose sugar is illustrated below (3D on left and line drawing on the right)- it is the sugar used for part of cell genetics, that is for making ribosomes, transfer RNA, and messenger RNA. By removing only one oxygen atom from ribose, a cell can form deoxyribose, the sugar used to build deoxyribonucleic acid (DNA).

Starches are long chains of sugar molecules with complex branching patterns of bonding between the sugar molecules. The two principle starches encountered in cells include glycogen and cellulose. Chitin is another starch that also contains nitrogen components; chitin is very strong structurally, and forms the dense protective shell of crabs, insects, and other animals as well as certain microbes. Glycogen is animal starch, stored in animal cells. Cellulose is plant starch. Both can serve as reserve nutrient sources, because sugar molecules can be cleaved off the starch and used for fuel. Cellulose starch also functions for cell membrane structural integrity in certain cells.

These are substances that are not soluble in water. Lipids include dietary fats (cholesterol, fatty acids in margarine and other foods) as well as oils, grease, gasoline, steroid hormones, prostaglandin hormones, and many other biological molecules. Structurally lipids are comprised of lots of carbon and hydrogen atoms. Attached to the lipid at various points may be other atoms such as oxygen, or a side group such as a hydroxyl group (OH), but the great majority of lipid composition is that of lots of C and H.

Amino acids and Proteins
Proteins are very important molecules, functioning both as structural components of cells and as enzymatic molecules that catalyze (speed up) chemical reactions in cells. Proteins are made of building blocks called amino acids, there being about 22 different amino acids in found in nature.

Amino acids (and hence proteins) have what chemists call a left handed (L) configuration, so that naturally occurring amino acids are named L-arginine, Lglycine, and so on. Nutrisweet artificial sweetener is actually a synthetic substance consisting of only two amino acids bonded together. So why does it have zero calories? Because the amino acids that are part of Nutrisweet are right handed (R) amino acids, unrecognizable by your body, except of course by your taste buds. All amino acids (abbreviated as AA) have a generic structure with one end of the AA having an amino (-NH2 group) and one end of the AA having an organic acid (COOH) group, sometimes called the carboxyl group.

Hence the name amino acid. Amino acids combine to form small chains of amino acids called peptides, or even longer AA chains called polypeptides or proteins. Sometimes the terms peptide, polypeptide, and protein are used interchangeably, because of the disagreement among scientists as to what constitutes a peptide versus a polypeptide versus a protein. The bond that forms between amino acids to form peptides, polypeptides, and proteins is called the peptide bond, and is formed between amino and carboxyl groups. During peptide bond formation, water is removed, so the reaction is that of a dehydration synthesis reaction. The reverse of bond formation is bond breaking, by addition of water, in what is called a hydrolysis degradation reaction.

Muscle, skin, and connective tissue proteins, as well as intracellular proteins, are all formed by joining amino acids together with dehydration reactions. During starvation, hydrolysis of proteins yields free amino acids that are used for metabolism for help provide energy.

Cellular Genetics

DNA, RNA, Transcription, Translation, mRNA, tRNA, codon, anticodon
Genetic Encoding Ciphering of cell information, that is genetic encoding, occurs in the form of ribonucleic acid (RNA) or deoxyribonucleic acid (DNA) in viruses, but always as DNA in cells (at least on our planet, but who knows what occurs in other galaxies?). Structurally, DNA is not that complicated a molecule. It has a simple backbone consisting of alternating units of the sugar deoxyribose and phosphate (PO4). Attached to each sugar is a genetic code "word," a nucleotide base. Two such strands of DNA are usually bonded together to form what is called a "double helix" of DNA. The double stranded DNA molecule is twisted to form a helix, appearing as if a ladder were twisted along its axis. Biochemically, the nucleotide bases of DNA are known as purines and pyrimidines. It is the nucleotide bases, or rather their sequence, that constitutes the actual genetic code for all of a cell's proteins. There are four nucleotides in DNA: adenine, thymine, guanine, and cytosine. They are abbreviated as A,T,G, and C. As you will learn soon, a codon, that is a sequence of three bases codes for a single amino acid. { How many bases along a length of DNA would be needed to code for an enzyme protein composed of 1000 amino acids? Answer: 3000. } RNA RNA is very similar to DNA. In RNA the base uracil, U, is substituted for thymine, T. So there is no thymine in RNA. Also, RNA uses the sugar ribose, not deoxyribose. Genes Along a molecule of DNA are various sequences of nucleotides coding for various proteins. Each sequence of nucleic acid coding for a protein is called a gene. Typically there will be hundreds or thousands of genes along a length of DNA,

interspersed with special nucleotide sequences that are start (e.g. TAC) and stop (e.g. ACT or ATT) signals for gene reading by the cell. DNA Arrangements Viruses sometimes have ssDNA for their genome (or dsDNA, or ssRNA, or dsRNA), however all cells have double stranded DNA (two strands of DNA twisted on each other in a helical pattern). Hence the term alpha-helix for the three dimensional structural description of a double stranded DNA (dsDNA) molecule. Two opposite strands (lengths) of DNA are able to associate because of the fact that certain base pairs have a binding affinity for each other. This is known as complimentary base pairing. A readily pairs with T, and G with C; these are known as the base pairing rules of DNA. Though two strands of single stranded DNA (ssDNA) are twisted on each other, each ssDNA carries its own unique genes; ssDNA is related to its complimentary strand of DNA only spatially, not genetically. When dsDNA is genetically decoded, the helix is unzipped, a gene is "read" off the appropriate ssDNA by RNA polymerase enzyme (that creates mRNA), and then the helix is zippered shut. dsDNA is always circular in bacteria, but linear in eukaryotic cells. Circular dsDNA is like taking two lengths of string, twisting them on each other, and then closing off the ends. Linear dsDNA is like taking two lengths of string and then twisting them on each other. A chromosome is a dsDNA molecule coiled around special histone proteins. Chromosomes are visible in a stained cell when using a light microscope, and are normally visible when a cell is in the process of division. When a cell is not dividing, there is less coiling of the dsDNA around the histone proteins, and then the complex is called chromatin. Chromatin is barely visible in a cell, and is the normal state of the genetic material in a non-dividing cell. Gene Decoding Decoding of a gene to create a gene product involves transcription and translation. Transcription is the process where RNA polymerase enzyme unzips a region of dsDNA and reads a gene sequence, creating a copy of the gene sequence in the form of RNA. This RNA is called messenger RNA, or mRNA. The mRNA then is carried to a ribosome where it is "read" (decoded). Translation is the process of building an amino acid chain, that is a polypeptide (protein) by way of ribosomal decoding of the mRNA. This involves the ribosome reading the mRNA and the bonding together of appropriate amino acids coded for by the mRNA. Just as triplets of nucleotide bases along DNA, called codons, encode for 1 amino acid of a gene product, triplets of mRNA are also codons. Ribosomes read the mRNA codons one at a time, to determine what amino acid should become part of the gene product. As a codon is read, the complimentary anti-codon nucleotide

sequence of a special amino acid carrier molecule called transfer RNA, or tRNA, base pairs with the codon, bringing the specific amino acid with it that is coded for by the mRNA. As each mRNA codon calls its specific amino acid into place through the use of specific anti-codon complimentary tRNA amino acid carriers, the genetically encoded amino sequence for the gene product is brought into place. The ribosome enzymatically bonds the amino acids together, and a polypeptide, or protein, is built. The translation process is complete, and the gene has been decoded. Amino Acids Table By consulting a table of codons coding for each amino acid, you can decipher a genetic sequence of DNA nucleotides or mRNA nucleotides to determine the resulting gene product, that is a protein (structural or enzymatic). For example, the nucleotide base sequence on mRNA (transcribed from the DNA sequence AGC) of UCG codes for the amino acid serine.
mRNA Second nucleotide base of mRNA codon U U UUU=phe UUC=phe UUA=leu UUG=leu C UC*=ser A UAU=tyr AUC=tyr UAA=stop UAG=stop CAU=his CAC=his CAA=gln CAG=gln AAU=asn AAC=asn AAA=lys AAG=lys GAU=asp GAC=asp GAA=glu GAG=glu G UGU=cys UGC=cys UGA=stop UGG=trp CG*=arg

First base of codon

C CU*=leu AUU=ile AUC=ile A AUA=ile AUG=met (start) G GU*=val



AGU=ser AGC=ser AGA=arg AGG=arg



Amino Acid Symbol Amino Acid Ala Asp Asn Cys Glu Phe Gly His Ile Lys Alanine Aspartic Asparagine Cysteine Glutamic acid Phenylalanine Glycine Histidine Isoleucine Lysine

Leu Met Asn Pro Gln Arg Ser Thr Val Trp Tyr Glu,Gln * End

Leucine Methionine Asparagine Proline Glutamine Arginine Serine Threonine Valine Tryptophan Tyrosine Glutamic, Glutamine Terminator

Cellular Replication

Mitosis, meiosis, budding, binary fission, conjugation
A detailed monograph on cell replication is available for those seeking more indepth information.

Bacterial Cell Replication Binary fission is the normal method of replication among bacteria; in this method of cell replication, the bacterial cells simply increase their cell mass slightly, replicate their cellular genome (DNA) and several other cell components, and then each cell divides equally into two cells.

[ rod-shaped bacterial cell undergoing binary fission ]

Binary fission as a method of cell replication is very efficient, with division possible every 5 or 10 minutes! Consider the number of cells formed from 1 cell that divides every 10 minutes: in just a matter of hours millions of cells may form from just a single cell! Conjugation is another means of bacterial "replication" although the cells do not really replicate as with binary fission. But conjugation is important for propagation of bacteria. In conjugation, two bacterial cells meet, form a bridge, and exchange pieces of their DNA. This allows for sharing of genes among bacteria, even among different genera. To learn more about conjugation and bacterial genetics click here.

Eukaryotic Cell Replication Budding is a simple method of cell replication used principally by yeasts (single celled fungi). Following DNA replication (genome replication), unequal splitting of a cell occurs to form two cells. Part of the cell literally pinches off, taking with it genetic material as well as some cytoplasmic material. Mitosis is the common form of cell replication for tissue growth and regeneration among all multi-cellular organisms. The image panel below shows various phases of mitosis occurring among plant cells of an onion root tip. Each phase of cell division will be discussed individually.

During cell division, replication of cell genetic and cytoplasmic material occurs, followed by a highly organized splitting of the cell's contents. The two cells formed following mitosis, called daughter cells (lower right image in the six-panel image seen above), are genetically identical, and each has about 1/2 the cell mass of the original cell; shortly, however, each daughter cell will increase its size to that of a typical cell of the type from whence each daughter cell originated. The process of mitosis is divided for human convenience into discrete stages or phases (also divisible into early, middle, and late phases) known as interphase, prophase, metaphase, anaphase, telophase, and finally daughter cells. These six phases of mitosis can be seen in the photo below, if you read the photos as you would two lines in a book (left to right, then down to the second row and again left to right).

Animal Cells
Interphase During interphase cells are busy doing their normal cell activities. Cell metabolism is occurring. The cell is doing whatever its normal function is (this depends on the cell's genetic programming). Interphase is actually not part of the normally listed phases of cell replication. Prophase. During interphase, the DNA is replicated in preparation for prophase. A new set of genes (DNA) will be needed for the new cell that will be formed. As prophase occurs, the DNA coils tightly and becomes visible as chromosomes. The chromosomes are randomly arranged in the cell. The nuclear membrane disappears.

Plant Cells

Metaphase. During metaphase a cell aligns its chromosomes in the middle region of the cell. Centrioles at each pole of the cell send out spindle fibers that grasp each chromosome. The cell is preparing to separate the chromosomes.

Anaphase. During anaphase the cell chromosomes are separated. Spindle fibers shorten so that the newly synthesized chromosomes (DNA) are pulled to one end of the cell. The original chromosomes (DNA) is pulled to the other end of the cell.

Telophase. During telophase, separation of chromosomes is complete. The cell begins to break apart into two cells. The chromosomes begin to uncoil. Nuclear membranes begin to reform around the chromosomes.

Daughter Cells. When mitosis is complete, the cell divides into two new cells, each resembling the original interphase resting cells, but smaller. Two cells now exist as a result of mitosis. One cell contains the newly synthesized DNA. The other cell contains the original DNA. Each cell has about one half the biomass of the original cells. Soon each cell will acquire nutrients and will grow in size so as to acquire the size that is normal for the cell type. \ Allium. Seen below are phases of mitosis as seen in tissue sections of onion (Allium) root tip. Root tips are excellent tissue sections to study to learn mitosis, since root tips are rapidly growing and thus have many cells in stages of replication. Test your knowledge- can you spot the cells undergoing cellular mitosis? Can you name the phase for such cells? Click on an image to see an enlargement.

The cell in the very center is in the phase of mitosis known as anaphase. Notice the chromosomes splitting- half moving to the right, half moving to the left. The spindle fibers are faintly visible. The cells to either side of the anaphase cell are in interphase. This is a very low magnification photograph of onion root tip cells. Can you spot the cell undergoing metaphase in the center of this tissue section of about 50 cells? Also, the cell along the bottom, 4th from the left, is in metaphase. About 8 cells are seen here. In the lower left is a cell in anaphase. In the middle and somewhat towards the top is a cell in metaphase (aligned chromosomes). The other cells are in interphase and

prophase. The cell in the very center is in the phase of mitosis known as prophase. The chromosomes are coiled and are randomly arranged in the cell center. Just above the prophase cell is a cell that is just ending telophase- with daughter cells forming. The cell in the upper left is undergoing anaphase (first row, first cell on left). Move just one cell to the right and down one cell and you will see a cell in late telophase - with a cell plate having formed down the middle and with two nuclei of the soon-to-be daughter cells reforming.
Meiosis is a mode of cell replication that occurs only in the gonads (testis and ovary) of eukaryotes, in order to produce germ cells (sperm and egg cells, not 'germs' such as bacteria). Meiosis is a reduction division, where a cell's content of genetic material is reduced to form daughter cells having 1/2 the amount of DNA (and genes) found in regular body cells. Following meiosis, sperm and egg cells potentially combine during fertilization to form a fertilized egg called a zygote. The zygote now has the full complement of genetic material (1/2 + 1/2=1). When viewed under the microscope, the stages of meiosis can appear very similar to those of mitosis, so phases of meiosis will not be shown here.

Uncontrolled replication of cells leads to cell overgrowths, that is tumors. Tumors can be classified as benign or malignant. Benign tumors are simply excessive cell growths that will not cause any significant harm. Malignant tumors, that is cancers, are cell growths where the cells are replicating without any inhibition of cell growth, and they will cause death to the organism if allowed to continue growing. Naming Conventions for Tumors Here are the naming conventions used for the more common tumors: • • • • • • • • • • Carcinomas are cancers of epithelial tissues (cells lining the surfaces of an organism). Sarcomas are cancers of connective tissues. Leukemias are cancers of white blood cells. Lymphomas are tumors of the lymph nodes. Osteomas are tumors of bone. Osteosarcomas are sarcomas of bone tissue. Neuromas are benign tumors of nerve tissue. Leiomyomas are benign tumors of smooth muscle tissue. Rhabdomyomas are benign tumors of voluntary (skeletal) muscle. Chrondromas are benign tumors of cartilage. Chrondrosarcomas are malignant tumors of cartilage.

• •

Adenomas are benign tumors of glandular tissue Adenocarcinomas are malignant tumors of glandular tissue. Look at the photo below- it is from a biopsy of a cancer. Several (3) cells show visible stages of mitosis (dark coiled chromosomes), indicating that the tissue is cancerous ( tissues have a certain percentage of their cells undergoing mitosis, called the mitotic index; when the mitotic index is high, as with the tissue below, a cancer or tumor of some sort is suspected.)


Agents that can trigger cells to become tumorous include: environmental carcinogens in food, water, or air; cancer-causing genes called oncogenes that are transmitted by certain viruses; and inherent oncogenes, triggered by repeated trauma to a cell.

Cellular Arrangements and Tissues
There are four tissue types: nervous tissue, muscle tissue, connective tissue, and epithelial tissue. All multi-celled animal life forms are composed of various combinations of these four tissues. BASIC TISSUE TYPES
Nervous Tissue is specialized for creating and conducting electrical signals, and includes neurons (nerve cells) as part of its tissue. Neurons are the cells adapted for receiving and eliciting electrical signals. Signals are sent to other neurons, glands, and muscle cells. The photo on the right shows a classic nerve cell ("neuron") appearance- pointed edges giving it a quality somewhat like a "ninja star" or thorn. Muscle Tissue is specialized for cellular contraction, and hence movement of the organism or parts of the organism. The photo on the right shows several muscle cells of the heart. Muscle cells tend to be elongated and red in appearance. Heart muscle has the characteristic cellular branchings such as are seen in this tissue section. Connective Tissue is specialized to connect parts of an organism. Types of connective tissue include loose (like fascia, the filmy material you see when you pull the skin off chicken when skinning a chicken), tendons, ligaments, and so on. The photo on the right shows a section of bone tissue, just one of the many types of connective tissues (tendons, ligaments, bone, cartilage, fat, and blood). Epithelial Tissue lines body surfaces, both internal and external, and is adapted for protection, secretion, and absorption. Epithelium is named, that is classified, according to its outer cell layer's shape, whether the tissue is one cell thick ("simple") or is layered ("stratified"), and whether the outer cells have cilia and whether some of the cells are goblet shaped mucous secreting cells. The photo on the right is a 3D scanning electron microscope photo showing several relatively flat epithelial cells covering a tissue surface.

Example Photo

Remember-- you are only to learn to differentiate the four basic tissue types! You are NOT expected to learn each of the specific tissue subtypes of the four basic tissues. So don't panic when you view all the different tissue subtypes.

NOTE: For more experience studying tissues, visit the histology lab center where you can learn more about cell arrangements and tissue types. Many digital images are available their for your viewing. Tissue Development Development of tissues occurs from primitive embryonic cell layers called germ layers. There are 3 germ layers that form in the embryonic cell mass: GERM LAYER DEVELOPS INTO...

ectoderm (outer shell of cells) skin, brain, eye, nerves mesoderm (middle cell layer) muscle, bone, vessels, connective tissues endoderm (inner cell layer) gut, liver, pancreas

Fertilization and Zygote Formation When a sperm cell fertilizes an egg cell, a fertilized egg or zygote is formed. The zygote then divides into 2 cells, then 4 cells, then 16 cells, then 32 cells, then 64 cells, then 128 cells. Note that growth is at a geometric rate. The cell numbers of a developing embryo increases at a fantastic rate as a single fertilized cell matures into an embryo and then a fetus (nymphs or larvae in the case of insects, worms, and so on.) Morula Formation As cell mass increases from a fertilized egg dividing and with geometric cell mass increase, the embryo begins looking like a bunch of mulberries (well, sort of if you use some imagination), so that is what it is called. Except "mulberry" is translated into Latin, the universal scientific language, to form the word morula. Blastula Formation Soon the mulberry (morula) hollows out, forming a hollow cavity, sort of like a blown-up balloon, and it is then called a blastula. Gastrula Formation One end of the blastula invaginates, sort of like pushing your finger into the blownup balloon. Now the embryo is said to be a gastrula; gastrulation has occurred. Note that there are now two cavities- the cavity in the balloon filled with air (call this cavity #1) and the cavity formed by gastrulation (call this cavity #2). Cavity #1 will become the thoracic and abdominal cavities, and cavity #2 will become the

gastrointestinal tract (Did you notice the "gast-" prefix in both gastrulation and gastrointestinal tract?). Ectoderm The outer layer of cells, that is the outer skin of the balloon, is what is called the ectoderm germ layer of cells, and as the embryo continues to grow and differentiate into a fetus the ectoderm cells will form tissues and organs such as skin, nervous tissue, brain, and the eye. Mesoderm The middle layer of cells, that is the inner skin of the balloon, is what is called the mesoderm germ layer of cells, and as the embryo continues to grow and differentiate into a fetus the mesoderm cells will form tissues such as muscle, blood vessels, cartilage, bone, ligaments, and other connective tissues. Endoderm The layer of cells lining the gastrulation cavity (cavity #2), that is the skin of the balloon surrounding your finger that you poked into the balloon, is what is called the endoderm germ layer of cells, and as the embryo continues to grow and differentiate into a fetus the endoderm cells will form epithelium lining the entire gut. Tissue Components Tissues are made of matrix and cells. Matrix is the non-cellular material between tissue cells, secreted by cells; matrix consists of both organic components (such as collagen and elastic proteins to give tissues strength and elasticity) and inorganic components (such as water and minerals). Useful Suffixes. Cells of tissues are named according their tissue type, but many cells share common suffixes that reveal clues about their function. "- cytes" are mature cells that perform common tissue functions. "- blasts" are immature tissue cells that give rise to other mature tissue cells. "- clasts" are tissue destroying cells.

Homo sapiens Map View Chromosome:
[ 1 ] 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X Y Display settings Master: Genes On Sequence Map Total Genes On Chromosome: 955 Region Displayed: 0-220467 Kbp Genes Labeled: 20 Total Genes in Region: 955


orient. links cyto.
av sv 1pter-1p12 av sv 1p36-p22 av sv 1p35 av sv 1 av sv 1 av sv 1p35-p34 av sv 1p32 + + + + + + + + + -

full name
DKFZP564C186 protein spermidine synthase phospholipase A2, group IIA (platelets, synovial fluid) PRO2047 protein hypothetical protein FLJ10468 fatty acid amide hydrolase complement component 8, beta polypeptide

DKFZP564C186 +

av sv 1p31.2-p31.1 growth arrest and DNA-damage-inducible, alpha av sv 1pter-1q31.1 protein kinase C-like 2 av sv 1pter-1q31.1 ATPase inhibitor precursor av sv 1 av sv 1q21.1 av sv 1q21 av sv 1q21 av sv 1q21.1 av sv 1q23 av sv 1 av sv 1 av sv 1q41 hypothetical protein FLJ10330 U4/U6-associated RNA splicing factor aryl hydrocarbon receptor nuclear translocator jumping translocation breakpoint phosphoprotein enriched in astrocytes 15 coagulation factor V (proaccelerin, labile factor) hypothetical protein FLJ10083 suppression of tumorigenicity 16 (melanoma differentiation) estrogen-related receptor gamma

av sv 1q42.1-q42.2 Chediak-Higashi syndrome 1

Archaeoglobus fulgidus, complete genome 49546..99545
62 protein coding genes

Legend Find Open Reading Frames Coding region on direct strand Coding region on complementary strand Overlapping region

Genetic States

Disease Histogram of Chromosome

Genetic Manipulation

Codons Found In DNA
Second Position of Codon T TTT TTC TTA TTG Phe Phe Leu Leu Leu Leu Leu Leu Ile Ile Ile Met Val Val Val Val [F] [F] [L] [L] [L] [L] [L] [L] [I] [I] [I] [M] [V] [V] [V] [V] TCT TCC TCA TCG CCT CCC CCA CCG ACT ACC ACA ACG GCT GCC GCA GCG C Ser Ser Ser Ser Pro Pro Pro Pro Thr Thr Thr Thr Ala Ala Ala Ala [S] [S] [S] [S] [P] [P] [P] [P] [T] [T] [T] [T] [A] [A] [A] [A] TAT TAC TAA TAG CAT CAC CAA CAG AAT AAC AAA AAG GAT GAC GAA GAG A Tyr Tyr Ter Ter His His Gln Gln Asn Asn Lys Lys Asp Asp Glu Glu [Y] [Y] [end] [end] [H] [H] [Q] [Q] [N] [N] [K] [K] [D] [D] [E] [E] TGT TGC TGA TGG CGT CGC CGA CGG AGT AGC AGA AGG GGT GGC GGA GGG G Cys Cys Ter Trp Arg Arg Arg Arg Ser Ser Arg Arg Gly Gly Gly Gly [C] [C] [end] [W] [R] [R] [R] [R] [S] [S] [R] [R] [G] [G] [G] [G] T C A G T C A G T C A G T C A G

T F i r CTT s CTC t C CTA CTG P o s i A t i o n G

T h i r d P o s i t i o n


Codons Found In Messenger RNA






An explanation of the Genetic Code: DNA is a two-stranded molecule. Each strand is a polynucleotide composed of A (adenosine), T (thymidine), C (cytidine), and G (guanosine) residues polymerized by "dehydration" synthesis in linear chains with specific sequences. Each strand has polarity, such that the 5'-hydroxyl (or 5'-phospho) group of the first nucleotide begins the strand and the 3'-hydroxyl group of the final nucleotide ends the strand; accordingly, we say that this strand runs 5' to 3' ("Five prime to three prime") . It is also essential to know that the two strands of DNA run antiparallel such that one strand runs 5' -> 3' while the other one runs 3' -> 5'. At each nucleotide residue along the double-stranded DNA molecule, the nucleotides are complementary. That is, A forms two hydrogen-bonds with T; C forms three hydrogen bonds with G. In most cases the two-stranded, antiparallel, complementary DNA molecule folds to form a helical structure which resembles a spiral staircase. This is the reason why DNA has been referred to as the "Double Helix". One strand of DNA holds the information that codes for various genes; this strand is often called the template strand or antisense strand (containing anticodons). The other, and complementary, strand is called the coding strand or sense strand (containing codons). Since mRNA is made from the template strand, it has the same information as the coding strand. The table above refers to triplet nucleotide codons along the sequence of the coding or sense strand of DNA as it runs 5' -> 3'; the code for the mRNA would be identical but for the fact that RNA contains U (uridine) rather than T. An example of two complementary strands of DNA would be: (5' -> 3') ATGGAATTCTCGCTC (3' <- 5') TACCTTAAGAGCGAG (5' -> 3') AUGGAAUUCUCGCUC (Coding, sense strand) (Template, antisense strand) (mRNA made from Template strand)

Since amino acid residues of proteins are specified as triplet codons, the protein sequence made from the above example would be Met-Glu-Phe-Ser-Leu... (MEFSL...). Practically, codons are "decoded" by transfer RNAs (tRNA) which interact with a ribosome-bound messenger RNA (mRNA) containing the coding sequence. There are 64 different tRNAs, each of which has an anticodon loop (used to recognize codons in the mRNA). 61 of these have a bound amino acyl residue; the appropriate "charged" tRNA binds to the respective next codon in the mRNA and the ribosome catalyzes the transfer of the amino acid from the tRNA to the growing (nascent) protein/polypeptide chain. The remaining 3 codons are used for "punctuation"; that is, they signal the termination (the end) of the growing polypeptide chain. Lastly, the Genetic Code in the table above has also been called "The Universal Genetic Code". It is known as "universal", because it is used by all known organisms as a code for

DNA, mRNA, and tRNA. The universality of the genetic code encompases animals (including humans), plants, fungi, archaea, bacteria, and viruses. However, all rules have their exceptions, and such is the case with the Genetic Code; small variations in the code exist in mitochondria and certain microbes. Nonetheless, it should be emphasized that these variances represent only a small fraction of known cases, and that the Genetic Code applies quite broadly, certainly to all known nuclear genes.

Codon Tables
Third Position A C G U _____________________________ | Lys Asn Lys Asn | Thr Thr Thr Thr | Arg Ser Arg Ser | Ile Ile MET Ile | Gln His Gln His | Pro Pro Pro Pro | Arg Arg Arg Arg | Leu Leu Leu Leu | Glu Asp Glu Asp | Ala Ala Ala Ala | Gly Gly Gly Gly | Val Val Val Val | . Tyr . Tyr | Ser Ser Ser Ser | . Cys Trp Cys | Leu Phe Leu Phe

F i r s P t o s & i t S i e o c n o n d


Another way to look at this is:
NAME 3 Letter Abbreviation Ala Cys Asp Glu Phe Gly His Ile Lys Leu Met Asn 1 Letter Abbreviation 1. 3. 4. 5. 6. 7. 8. 9. 11. 12. 13. 14. A C D E DNA codons for each Amino Acids GCA,GCC,GCG,GCU UGC,UGU GAC,GAU GAA,GAG UUC,UUU GGA,GGC,GGG,GGU CAC,CAU AUA,AUC,AUU AAA,AAG UUA,UUG,CUA,CUC,CUG,CUU AUG AAC,AAU

Alanine Cysteine Aspartic Acid Glutamic Acid Phenylalanine Glycine Histidine Isoleucine Lysine Leucine Methionine Asparagine


Proline Glutamine Arginine Serine Threonine Valine Tryptophan Tyrosine Stop Codons

Pro Gln Arg Ser Thr Val Trp Tyr

16. 17. 18. 19. 20. 22. 23. 25. .



An example of the multiple combinations of DNA possible for a single peptide is an example of spelling my first name (without a termination codon): So to code for 'MARK' there would be 16 combinations, other sequences of 4 letters would vary in the number of possibilities based on the number of codons that could code for a single amino acid. Some amino acids have up to 6 codons that will be translated into a single Amino Acid. M A R K MET Ala Arg Lys =============== AUG-GCU-AGA-AAG AUG-GCG-AGA-AAG AUG-GCC-AGA-AAG AUG-GCA-AGA-AAG M A R K MET Ala Arg Lys =============== AUG-GCU-AGG-AAG AUG-GCG-AGG-AAG AUG-GCC-AGG-AAG AUG-GCA-AGG-AAG M A R K MET Ala Arg Lys =============== AUG-GCU-AGA-AAA AUG-GCG-AGA-AAA AUG-GCC-AGA-AAA AUG-GCA-AGA-AAA M A R K MET Ala Arg Lys =============== AUG-GCU-AGG-AAA AUG-GCG-AGG-AAA AUG-GCC-AGG-AAA AUG-GCA-AGG-AAA

Clusters of Orthologous Groups Clusters of Orthologous Groups of proteins (COGs) were delineated by comparing
protein sequences encoded in 34 complete genomes, representing 26 major phylogenetic lineages. Each COG consists of individual proteins or groups of paralogs from at least 3

lineages and thus corresponds to an ancient conserved domain. Proteins from two eukaryotic genomes were assigned to COGs and can be reached from each individual COG page Code A O M T P K Name
Archaeoglobus fulgidus Halobacterium sp. NRC-1 Methanococcus jannaschii Methanobacterium thermoautotrophicum Thermoplasma acidophilum Pyrococcus horikoshii Pyrococcus abyssi Saccharomyces cerevisiae Aquifex aeolicus Thermotoga maritima Deinococcus radiodurans Mycobacterium tuberculosis Bacillus subtilis Bacillus halodurans Synechocystis Escherichia coli Buchnera sp. APS Pseudomonas aeruginosa Vibrio cholerae Haemophilus influenzae Xylella fastidiosa Neisseria meningitidis Helicobacter pylori Helicobacter pylori J99 Campylobacter jejuni

in COGs

Principal component analysis of genomes List of COGs Distribution Co-occurrences Phylogenetic patterns Phylogenetic patterns search Functional categories J D G R K O C S M E L N F P H T I

2420 2058 1786 1873 1479 2080 1767 2722 5954 1560 1858 3194 3924 4118 4066 3168 4286 575 5567 3834 1695 2766 2081 1578 1492 1634 836

1849 1404 1320 1375 1176 1365 1443 1169 2175 1317 1507 2176 2468 2803 2728 2113 3327 559 4191 2745 1504 1491 1455 1081 1062 1289 674

Z Aeropyrum pernix Y Q V D R B C E F G H S N U

Pathways and functional systems FTP

J X Rickettsia prowazekii


Chlamydia trachomatis Chlamydia pneumoniae Treponema pallidum Borrelia burgdorferi Ureaplasma urealyticum Mycoplasma pneumoniae Mycoplasma genitalium

895 1053 1036 1637 613 689 471

631 647 707 694 401 423 376



76,765 51,645

Gene Classification based on COG functional categories Protein coding genes distribution map To see map locations of genes, click on a region in the map, to zoom in on that region

Birgid Schlindwein's

Hypermedia Glossary Of Genetic Terms

Chromosome The term was proposed by Waldeyer (1888) for the individual threads

within a cell nucleus (gk. chroma, colour; soma, body). The selfreplicating genetic structures of cells containing the cellular DNA that bears in its nucleotide sequence the linear array of genes. In prokaryotes, chromosomal DNA is circular, and the entire genome is carried on one chromosome. Eukaryotic genomes consist of a number of chromosomes whose DNA is associated with different kinds of proteins.

Related Terms:
Nucleus The term introduced by Brown (1833) for the more or less spherical structure which occures in cells and stains deeply with basic dyes. The cellular organelle in eukaryotes that contains the genetic material. Nucleotide A subunit of DNA or RNA consisting of a nitrogenous base (purine in adenine and guanine, pyrimidine in thymine, or cytosine for DNA and uracil cytosine for RNA), a phosphate molecule, and a sugar molecule (deoxyribose in DNA and ribose in RNA). Depending one the sugar the nucleotides are called deoxyribonucleotides or ribonucleotides. Thousands of nucleotides are linked to form a DNA or RNA molecule. See also base pair. Gene The term coined by Johannsen (1909) for the fundamental physical and functional unit of heredity. The word gene was derived from De Vries' term pangen, itself a derivative of the word pangenesis which Darwin (1868) had coined. A gene is an ordered sequence of nucleotides located in a particular position (locus) on a particular chromosome that encodes a specific functional product (the gene product, i.e. a protein or RNA molecule). It includes regions involved in regulation of expression and regions that code for a specific functional product. See gene expression, allele. Prokaryote Cell or organism lacking a membrane-bound, structurally discrete nucleus and other subcellular compartments. Bacteria are prokaryotes. Compare eukaryote. See chromosomes. Eukaryote Cell or organism with membrane-bound, structurally discrete nucleus and other well-developed subcellular compartments. Eukaryotes include all organisms except viruses, bacteria, and blue-green algae. Compare prokaryote. See chromosomes.


A large molecule composed of one or more chains of amino acids in a specific order; the order is determined by the base sequence of nucleotides in the gene coding for the protein. Proteins are required for the structure, function, and regulation of the bodys cells, tissues, and organs, and each protein has unique functions. Examples are hormones, enzymes, and antibodies.

Related Terms:
Genetic The sequence of nucleotides, coded in triplets (codons) along the mRNA, that determines the sequence of amino acids in protein synthesis. The DNA code
sequence of a gene can be used to predict the mRNA sequence, and the genetic code can in turn be used to predict the amino acid sequence.

Related Terms:
Nucleotide A subunit of DNA or RNA consisting of a nitrogenous base (purine in adenine and guanine, pyrimidine in thymine, or cytosine for DNA and uracil cytosine for RNA), a phosphate molecule, and a sugar molecule (deoxyribose in DNA and ribose in RNA). Depending one the sugar the nucleotides are called deoxyribonucleotides or ribonucleotides. Thousands of nucleotides are linked to form a DNA or RNA molecule. See also base pair. The term proposed by Crick (1963) for the sequence of nucleotides in DNA or RNA.which is responsible for determining that a specific amino acid shall be inserted into a polypeptide chain. There is more than one codon for most amino acids. It has now been established that the codon is a triplet of nitrogenous bases in DNA or RNA that specifies a single amino acid. See genetic code. RNA that serves as a template for protein synthesis or for synthesis of cDNA. See genetic code. Any of a class of 20 molecules that are combined to form proteins in living things. The sequence of amino acids in a protein and hence protein function are determined by the genetic code. Amino acids contain a basic amino (NH2) group, an acidic carboxyl (COOH) group and a side chain (R - of a number of different kinds) attached to an alpha carbon atom. Thus the general formula is:


Messenger RNA (mRNA) Amino acid


Deoxyribonucleic acid (DNA)

DNA sequence Gene

A large molecule composed of one or more chains of amino acids in a specific order; the order is determined by the base sequence of nucleotides in the gene coding for the protein. Proteins are required for the structure, function, and regulation of the bodys cells, tissues, and organs, and each protein has unique functions. Examples are hormones, enzymes, and antibodies. The molecule that encodes genetic information. DNA is a doublestranded molecule held together by weak bonds between base pairs of nucleotides. The four nucleotides in DNA contain the bases: adenine (A), guanine (G), cytosine (C), and thymine (T). In nature, base pairs form only between A and T and between G and C; thus the base sequence of each single strand can be deduced from that of its partner. The relative order of base pairs, whether in a fragment of DNA, a gene, a chromosome, or an entire genome. See base sequence. The term coined by Johannsen (1909) for the fundamental physical and functional unit of heredity. The word gene was derived from De Vries' term pangen, itself a derivative of the word pangenesis which Darwin (1868) had coined. A gene is an ordered sequence of nucleotides located in a particular position (locus) on a particular chromosome that encodes a specific functional product (the gene product, i.e. a protein or RNA molecule). It includes regions involved in regulation of expression and regions that code for a specific functional product. See gene expression, allele.

Related Terms:
Yeast artificial chromosome (YAC) A vector used to clone DNA fragments (up to 400 kb); it is constructed from the telomeric, centromeric, and replication origin sequences needed for replication in yeast cells. The inserts can be much larger than those accepted by other vectors such as plasmids or cosmids. (Cf. cloning vector).

Related Terms:
Sequence tagged site (STS) Short (200 to 500 base pairs) sequence of genomic DNA that has a single occurrence in the human genome and whose location and base sequence are known. Detectable by polymerase chain reaction, STSs are useful for localizing and orienting the mapping and sequence data reported from many different laboratories and serve as landmarks on the developing physical map of the human genome. Expressed sequence tag (EST) is STS derived from cDNA.

Related Terms:

Single nucleotide Sequence polymorphism differing in a single base pair. polymorphism (SNP) Example for a single nucleotide substitution: Rice cultivars with 18% or less amylose had the sequence AGTTATA at the putative leader intron 5' splice site, while all cultivars with ahigher proportion of amylose had AGGTATA. See abstract of publication.

The units of hereditary information that occupies a fixed position (locus) on a chromosome. Genes achieve their effects by directing the synthesis of proteins. Genes are composed of deoxyribonucleic acid (DNA), except in some viruses, which have genes consisting of a closely related compound called ribonucleic acid (RNA). A DNA molecule is composed of two chains of nucleotides that wind about each other to resemble a twisted ladder. The sides of the ladder are made up of sugars and phosphates; the rungs are formed by bonded pairs of nitrogenous bases. These bases are adenine (A), guanine (G), cytosine (C), and thymine (T). An A on one chain bonds to a T on the other (thus forming an A-T ladder rung); similarly, a C on one chain bonds to a G on the other. If the bonds between the bases are broken, the two chains unwind, and free nucleotides within the cell attach themselves to the exposed bases of the now-separated chains. The free nucleotides line up along each chain according to the basepairing rule--A bonds to T, C bonds to G. This process results in the creation of two identical DNA molecules from one original and is the method by which hereditary information is passed from one generation of cells to the next. The sequence of bases along a strand of DNA determines the genetic code. When the product of a particular gene is needed, the portion of the DNA molecule that contains that gene will split. A strand of RNA with bases complementary to those of the gene is created from the free nucleotides in the cell. (RNA has the base uracil [U] instead of thymine, so A and U form base pairs during RNA synthesis.) This single chain of RNA, called messenger RNA (mRNA), then passes to the organelles called ribosomes, where protein synthesis takes place. A second type of RNA, transfer RNA (tRNA), matches up the nucleotides on mRNA with specific amino acids. Each set of three nucleotides codes for one amino acid. The series of amino acids built according to the sequence of nucleotides forms a polypeptide chain; all proteins are made from one or more linked polypeptide chains. Experiments indicate that one gene is responsible for the assembly of one polypeptide chain. This is known as the one-gene-one-polypeptide hypothesis. Other experiments have shown that many of the genes within a cell are inactive much or even all of the time. Thus, at any time, it seems that a gene can be switched on or off. The process by which genes are activated and deactivated in bacteria has been determined. Bacteria actually have three types of genes:

structural, operator, and regulator. Structural genes code for the synthesis of specific polypeptides. Operator genes contain the code necessary to begin the process of transcribing the DNA message of one or more structural genes into mRNA. Thus, structural genes are linked to an operator gene in a functional unit called an operon. Ultimately, the activity of the operon is controlled by a regulator gene, which produces a small protein molecule called a repressor. The repressor binds to the operator gene and prevents it from initiating the synthesis of the protein called for by the operon. The presence or absence of certain repressor molecules determines whether the operon is off or on. As mentioned, this model applies to bacteria. Gene regulation in higher organisms is less clearly understood. Mutations occur when the number or order of bases in a gene is disrupted. Nucleotides can be deleted, doubled, rearranged, or replaced, with each alteration having a particular effect. The mutation generally has little or no effect; when it does alter an organism, the change is frequently lethal. A beneficial mutation will rise in frequency within a population until it becomes the norm.

The Cell
In biology, the basic unit of which all living things are composed. As the smallest units retaining the fundamental properties of life, cells are the "atoms" of the living world. A single cell is often a complete organism in itself, such as a bacterium or yeast. Other cells, by differentiating in order to acquire specialized functions and cooperating with other specialized cells, become the building blocks of large multicellular organisms as complex as the human being. Although they are much larger than atoms, these building blocks are still very small. The smallest Figure 1: The initial proposal of the structure of known cells are a group of tiny bacteria called DNA by James Watson and mycoplasmas; some of these single-celled organisms are spheres about 0.3 micrometre in diameter, with a ... total mass of 10-14 gram--equal to that of 8,000,000,000 hydrogen atoms. Human cells typically have a mass 400,000 times larger, but even they are only about 20 micrometres across. It would require a sheet of about 10,000 human cells to cover the head of a pin, and each human being is composed of more than 75,000,000,000,000 cells. This article discusses the cell both as an individual unit and as a contributing part of a larger organism. As an individual unit the cell is capable of digesting its own nutrients, providing its own energy, and replicating itself in order to produce succeeding generations. It can be viewed as an enclosed vessel composed of even smaller units that serve as its skin, skeleton, brain, and digestive tract. Within this vessel innumerable chemical reactions take place simultaneously, all of them controlled so that they contribute to the life and

procreation of the cell. In a multicellular organism cells specialize to perform different functions. In order to do this each cell keeps in constant communication with its neighbours. As it receives nutrients from and expels wastes into its surroundings, it adheres to and cooperates with other cells. Cooperative assemblies of similar cells form tissues, and a cooperation between tissues in turn forms organs, the functional units of an organism. Special emphasis is given in this article to animal cells, with some discussion of the energy-synthesizing processes and extracellular components peculiar to plants. For detailed discussion of the biochemistry of plant cells, see photosynthesis. For full-length treatment of the genetic events in the cell nucleus, see heredity. Contents of this article: Introduction The nature and function of cells The cell as a self-replicating collection of catalysts The structure of biologic catalysts Coupled chemical reactions Photosynthesis: the beginning of the food chain ATP: fueling chemical reactions The cell as a replicator of information DNA: the genetic material RNA: replicated from DNA The cell as an organized unit Intracellular communication Intercellular communication The plasma membrane Chemical composition and structure of the membrane Membrane lipids Membrane proteins Membrane fluidity Transport across the membrane Permeation Membrane channels Facilitated diffusion The glucose transporter The anion transporter Secondary active transport Counter-transport Co-transport Primary active transport The sodium pump Calcium pumps Hydrogen ion pumps

Transport of particles Endocytosis Exocytosis Internal membranes General functions and characteristics Cellular organelles and their membranes The vacuole The lysosome Microbodies The endoplasmic reticulum The smooth endoplasmic reticulum The rough endoplasmic reticulum The Golgi apparatus Secretory vesicles Sorting of products by chemical receptors The nucleus Structural organization of the nucleus DNA packaging Nucleosomes: the subunits of chromatin Organization of chromatin fibre The nuclear envelope Genetic organization of the nucleus The structure of DNA Rearrangement and modification of DNA Genetic expression through RNA RNA synthesis Processing of mRNA Regulation of genetic expression Regulation of RNA synthesis Regulation of RNA after synthesis The mitochondrion and the chloroplast Mitochondrial and chloroplastic structure Metabolic functions The mitochondrion Formation of the electron donors NADH and FADH2 The electron-transport chain The chemiosmotic theory The chloroplast Trapping of light Fixation of carbon dioxide. Evolutionary origins The mitochondrion and chloroplast as independent entities The endosymbiont hypothesis The cytoskeleton Actin filaments Microtubules

Intermediate filaments Structural relation of the filaments The cell matrix and cell-to-cell communication The extracellular matrix Matrix polysaccharides Matrix proteins Cell-matrix interactions Intercellular recognition and cell adhesion Tissue and species recognition Cell junctions Adhering junctions Tight junctions Gap junctions Cell-to-cell communication via chemical signaling Types of chemical signaling Signal receptors Cellular response The plant cell wall Mechanical properties of wall layers Components of the cell wall Cellulose Matrix polysaccharides Proteins Plastics Intercellular communication Plasmodesmata Oligosaccharides with regulatory functions Cell division and growth Duplication of the genetic material Cell division Mitosis and cytokinesis Meiosis The cell division cycle Controlled proliferation Failure of proliferation control Cell differentiation The differentiated state The process of differentiation Embryonic differentiation Adult differentiation Errors in differentiation The evolution of cells The development of genetic information The development of metabolism The history of cell theory Formulation of the theory

Early observations The problem of the origin of cells The protoplasm concept Contribution of other sciences Bibliography General works Nature and function of cells Special studies in cell morphology Special studies in cell biology Evolution Summary In biology, the basic unit of which all living things are composed. The cell is the smallest structural unit of living matter that is capable of functioning independently. All cells are similar in composition, form, and function. A single cell can be a complete organism in itself, as in bacteria and protozoans. Groups of specialized cells are organized into tissues and organs in multicellular organisms such as the higher plants and animals. Cells were first observed in the 17th century, shortly after the discovery of the microscope. Their significance, however, was not understood until the early 19th century, when improvements in microscopy permitted closer observation. Cells are made up of macromolecules (giant molecules) and various smaller molecules. The chief macromolecules are nucleic acids (DNA [deoxyribonucleic acid] and RNA [ribonucleic acid]), proteins, and polysaccharides. DNA comprises the genetic code that carries the essential character of the organism from generation to generation. RNA translates the genetic information into proteins, which carry out vital cell functions. Proteins, for example, recognize and transport specific molecules into and out of the cell and catalyze all chemical reactions within the cell. Polysaccharides function as structural molecules in the rigid cell walls of bacterial and plant cells and as storage molecules in the glycogen granules of vertebrate muscle cells. Important among the smaller molecular components of cells are lipids, ATP (adenosine triphosphate), cyclic AMP(adenosine monophosphate), porphyrins, and water. Lipids are fatty substances that are a major component of cell membranes. ATP is the energy currency of the cell; this energy-rich molecule is formed when the cell needs to store energy and is broken down when the cell requires energy. Cyclic AMP functions as a regulator of cell activities; porphyrins are pigments essential for oxidation and photosynthesis. About 70 to 80 percent of a cell is water, which is vital to the chemistry of life. There are two distinct types of cells: procaryotic cells, found only in blue-green algae and in bacteria, and eucaryotic cells, composing all other life forms. A eucaryotic cell consists of an outer membrane, cytoplasm that contains various membrane-bound structures (organelles), and a membrane-bound nucleus that encloses the gene-bearing chromosomes. Procaryotic cells have a cell membrane

and cytoplasm, but they have no nucleus (their genetic material is organized into a single chromosome) and they lack membrane-bound cytoplasmic organelles. The molecular composition and activities of the two types of cells, however, are very similar. A cell is bound by a semipermeable membrane (called the plasma membrane) that enables it to exchange certain materials with its surroundings. The plasma membrane is made up of a double layer of lipids studded with proteins. Some of the proteins extend completely through the lipid layer, others only partially penetrate it, and still others are thought to be completely embedded within the lipid layer. In plants the membrane is enclosed in a rigid cellulose cell wall. The space between cells is filled with the extracellular matrix, a gel of polysaccharides swollen with water molecules in which are suspended protein fibres that hold cells together to form tissues. Within the cytoplasm of both procaryotic and eucaryotic cells are ribosomes, small bodies that are the sites of protein synthesis. In addition, eucaryotic cells have a variety of separate membrane-bound cytoplasmic organelles with special functions. These organelles include the endoplasmic reticulum, Golgi apparatus, lysosomes, mitochondria, and plastids. The endoplasmic reticulum is a network of channels that functions in the movement of materials within the cell. Associated with these channels is the Golgi apparatus, which is composed of sacs that bud off from the endoplasmic reticulum. These sacs transport cell products from the endoplasmic reticulum to their appropriate locations either inside or outside the cell. Lysosomes are sacs filled with digestive enzymes; they are capable of digesting worn-out cell parts or extracellular debris, such as dead cells or foreign microorganisms that have been engulfed by the cell. Mitochondria serve as the power plants of the cell; it is within these organelles that ATP is synthesized. Plastids are found in the cells of most plants but are absent from animal cells. Of immense importance are the plastids known as chloroplasts; they contain the machinery for photosynthesis, the process by which the energy of sunlight is captured to produce carbohydrates. The nucleus is the control centre of eucaryotic cells. Within this membranebound structure lie the chromosomes, which carry the hereditary material. The DNA of the chromosomes directs protein synthesis in the cell; the DNA instructions are carried from the nucleus to the cytoplasm by messenger RNA (mRNA). Procaryotic cells have no membrane-enclosed nucleus. They do, however, have nuclear matter consisting of a single chromosome. A eucaryotic cell divides, or reproduces, to form two genetically identical daughter cells in a process called mitosis. Prior to mitosis, the chromosomes replicate, so that there will be a complete set of hereditary instructions for each daughter cell. During mitosis, the doubled chromosomes are separated, with one copy of each going to each daughter cell. Among sexually reproducing eucaryotes, another type of cell division occurs in the formation of sex cells called gametes (i.e., eggs and sperm). This process is known as meiosis. It produces four gametes, each of which contains half the number of chromosomes of the parent cell. When a male gamete and a female gamete unite, they form a

new individual in which the full number of chromosomes is restored. Procaryotic cells reproduce in various ways, the most common being binary fission. This process involves replication of the cell's lone chromosome and the subsequent splitting of the parent cell into two daughters. It thus resembles mitosis in eucaryotes, but it lacks the special apparatus involved in true mitotic division. The two main types of cell death are necrotic cell death, or coagulative necrosis, and apoptosis, or programmed cell death. Necrosis occurs in a variety of contexts produced by disease, injury, or accident and is cell death imposed by external factors. A cell undergoing necrosis typically swells in size before its lysosomes rupture and the cell's internal contents spill out into extracellular space. In response to specific intracellular and extracellular signals, cells can also undergo programmed cell death. This apoptosis is a normal cellular process that plays an important role in growth and development. This type of cell death is marked by the shrinking of the cytoplasm and nucleus, degradation of the chromosomes, and the final splitting of the nucleus into a number of membranebound fragments.

Approximate Chemical Composition of a Typical Mammalian Cell
Component weight Water Inorganic ions (sodium, potassium, magnesium, calcium, chloride, etc.) Miscellaneous small metabolites Proteins RNA DNA Phospholipids and other lipids Polysaccharides percent of total cell 70 1 3 18 1.1 0.25 5 2

Biological Development
The progressive changes in size, shape, and function during the life of an organism by which its genetic potentials (genotype) are translated into functioning mature systems (phenotype). Most modern philosophical outlooks would consider that development of some kind or other characterizes all things, in both the physical and biological worlds. Such points of view go back to the very earliest days of philosophy. Among the pre-Socratic philosophers of Greek Ionia, half a millennium before Christ, some, like Heracleitus, believed that all natural things are constantly changing. In contrast, others, of whom Democritus is perhaps the prime example, suggested that the world is made up by the changing combinations of atoms, which themselves remain unaltered, not subject to change or development. The early period of post-Renaissance European science may be regarded as dominated by this latter atomistic view, which reached its fullest development in the period between Newton's laws of physics and Dalton's atomic theory of chemistry in the early 19th century. This outlook was never easily reconciled with the observations of biologists, and in the last hundred years a series of discoveries in the physical sciences have combined to swing opinion back toward the Heracleitan emphasis on the importance of process and development. The atom, which seemed so unalterable to Dalton, has proved to be divisible after all, and to maintain its identity only by processes of interaction between a number of component subatomic particles, which themselves must in certain aspects be regarded as processes rather than matter. Albert Einstein's theory of relativity showed that time and space are united in continuum, which implies that all things are involved in time; that is to say, in development. The philosophers who charted the transition from the nondevelopmental view, for which time was an accidental and inessential element, were Henri Bergson and, in particular, Alfred North Whitehead. Karl Marx and Friedrich Engels, with their insistence on the difference between dialectical and mechanical materialism, may be regarded as other important innovators of this trend, although the generality of their philosophy was somewhat compromised by the political context in which it was placed and the rigidity with which their later followers have interpreted it. Philosophies of the Heracleitan type, which emphasize process and development, provide much more appropriate frameworks for biology than do philosophies of the atomistic kind. Living organisms confront biologists with changes of various kinds, all of which could be regarded as in some sense developmental; however, biologists have found it convenient to distinguish the changes and to use the word development for only one of them. Biological development can be defined as the series of progressive, nonrepetitive changes that occur during the life history of an organism. The kernel of this definition is to contrast development with, on the one hand, the essentially repetitive chemical changes involved in the maintenance of the body, which constitute "metabolism," and on the other hand, with the longer term changes, which, while nonrepetitive, involve the

sequence of several or many life histories, and which constitute evolution. As with most formal definitions, these distinctions cannot always be applied strictly to the real world. In the viruses, for instance, and even in bacteria, it is difficult to make a distinction between metabolism and development, since the metabolic activity of a virus particle consists of little more than the development of new virus particles. In certain other cases, the distinction between development and evolution becomes blurred: the concept of an individual organism with a definite life history may be very difficult to apply in plants that reproduce by vegetative division, the breaking off of a part that can grow into another complete plant. The possibilities for debate that arise in these special cases, however, do not in any way invalidate the general usefulness of the distinctions as conventionally made in biology. Contents of this article: Introduction The scope of development Types of development Quantitative and qualitative development Progressive and regressive development Single-phase and multiphase development Structural and functional development Normal and abnormal development General systems of development Development of single-celled organisms Open and closed systems of development Blastogenesis versus embryogenesis Constituent processes of development Growth Morphogenesis Morphogenesis by differential growth Morphogenetic fields Morphogenesis by the self-assembly of units Differentiation Control and integration of development Phenomenological aspects Analytical aspects Development and evolution Effect on life histories Length and timing of the reproductive phase Recapitulation of ancestral stages Adaptability and the canalization of development Genetic assimilation Bibliography

The Human Body

The physical substance of the human organism, composed of living cells and extracellular materials and organized into tissues, organs, and systems. Human anatomy and physiology are treated in many different articles. For detailed coverage of the body's biochemical constituents, see Proteins; Carbohydrates; Lipids; Nucleic Acids; Vitamins; and Hormones. For information on the structure and function of the cells that constitute the body, see Cells. For detailed discussions of specific tissues, organs, and systems, see Blood; Circulation and Circulatory Systems: The human cardiovascular system; Digestion and Digestive Systems; Endocrine Systems: The human endocrine system; Excretion and Excretory Systems: The human excretory system; Integumentary Systems: The human skin; Muscles and Muscle Systems; Nerves and Nervous Systems; Reproduction and Reproductive Systems: The human reproductive system; Respiration and Respiratory Systems: Human respiration; Sensory Reception: Human sensory reception; Supportive and Connective Tissues: The human skeletal system. For a description of how the body develops, from conception through old age, see Growth and Development, Biological: Human growth and development. Many entries describe the body's major structures. For example, see abdominal cavity; adrenal gland; aorta; bone; brain; ear; eye; heart; kidney; large intestine; lung; nose; ovary; pancreas; pituitary gland; small intestine; spinal cord; spleen; stomach; testis; thymus; thyroid gland; tooth; uterus; vertebral column. Human beings are, of course, animals--more particularly, members of the order Mammalia in the subphylum Vertebrata of the phylum Chordata. Like all chordates, the human animal has a bilaterally symmetrical body that is characterized at some point during its development by a dorsal supporting rod (the notochord), gill slits in the region of the pharynx, and a hollow dorsal nerve cord. Of these features, the first two are present only during the embryonic stage in the human; the notochord is replaced by the vertebral column, and the pharyngeal gill slits are lost completely. The dorsal nerve cord is the spinal cord in human beings; it remains throughout life. Characteristic of the vertebrate form, the human body has an internal skeleton that includes a backbone of vertebrae. Typical of mammalian structure, the human body shows such characteristics as hair, mammary glands, and highly developed sense organs. Beyond these similarities, however, lie some profound differences. Among the mammals, only human beings have a predominantly two-legged (bipedal) posture, a fact that has greatly modified the general mammalian body plan. (Even the kangaroo, which hops on two legs when moving rapidly, walks on four legs and uses its tail as a "third leg" when standing.) Moreover, the human brain, particularly that part called the neocortex, is far and away the most highly developed in the animal kingdom. As intelligent as are many other mammals-such as chimpanzees and dolphins--none have achieved the intellectual status of the human species. Contents of this article:

Introduction Chemical composition of the body. Organization of the body. Basic form and development. Effects of aging. Change incident to environmental factors.

Summary The Chemical composition of the body. Chemically, the human body consists mainly of water and of organic compounds--i.e., lipids, proteins, carbohydrates, and nucleic acids. Water is found in the extracellular fluids of the body (the blood plasma, the lymph, and the interstitial fluid) and within the cells themselves. It serves as a solvent without which the chemistry of life could not take place. The human body is about 60 percent water by weight. Lipids--chiefly fats, phospholipids, and steroids--are major structural components of the human body. Fats provide an energy reserve for the body, and fat pads also serve as insulation and shock absorbers. Phospholipids and the steroid compound cholesterol are major components of the membrane that surrounds each cell. Proteins also serve as a major structural component of the body. Like lipids, proteins are an important constituent of the cell membrane. In addition, such extracellular materials as hair and nails are composed of protein. So also is collagen, the fibrous, elastic material that makes up much of the body's skin, bones, tendons, and ligaments. Proteins also perform numerous functional roles in the body. Particularly important are those cellular proteins called enzymes, which catalyze the chemical reactions necessary for life. Carbohydrates are present in the human body largely as fuels, either as simple sugars circulating through the bloodstream or as glycogen, a storage compound found in the liver and the muscles. Small amounts of carbohydrates also occur in cell membranes, but, in contrast to plants and many invertebrate animals, humans have little structural carbohydrate in their bodies. Nucleic acids make up the genetic materials of the body. Deoxyribonucleic acid (DNA) carries the body's hereditary master code, the instructions according to which each cell operates. It is DNA, passed from parents to offspring, that dictates the inherited characteristics of each human being. Ribonucleic acid (RNA), of which there are several types, helps carry out the instructions encoded in the DNA. Along with water and organic compounds, the body's constituents include various inorganic minerals. Chief among these are calcium, phosphorus, sodium, magnesium, and iron. Calcium and phosphorus, combined as calcium-phosphate crystals, form a large part of the body's bones. Calcium is also present as ions in the blood and interstitial fluid, as is sodium. Ions of phosphorus, potassium, and

magnesium, on the other hand, are abundant within the intercellular fluid. All of these ions play vital roles in the body's metabolic processes. Iron is present mainly as part of hemoglobin, the oxygen-carrying pigment of the red blood cells. Other mineral constituents of the body, found in minute but necessary concentrations, include cobalt, copper, iodine, manganese, and zinc. The Organization of the body. The cell is the basic living unit of the human body--indeed, of all organisms. The human body consists of more than 75 trillion cells, each capable of growth, metabolism, response to stimuli, and, with some exceptions, reproduction. Although there are some 200 different types of cells in the body, these can be grouped into four basic classes. These four basic cell types, together with their extracellular materials, form the fundamental tissues of the human body: (1) epithelial tissues, which cover the body's surface and line the internal organs, body cavities, and passageways; (2) muscle tissues, which are capable of contraction and form the body's musculature; (3) nerve tissues, which conduct electrical impulses and make up the nervous system; and (4) connective tissues, which are composed of widely spaced cells and large amounts of intercellular matrix and which bind together various body structures. (Bone and blood are considered specialized connective tissues, in which the intercellular matrix is, respectively, hard and liquid.) The next level of organization in the body is that of the organ. An organ is a group of tissues that constitutes a distinct structural and functional unit. Thus, the heart is an organ composed of all four tissues, whose function is to pump blood throughout the body. Of course, the heart does not function in isolation; it is part of a system composed of blood and blood vessels as well. The highest level of body organization, then, is that of the organ system. The body includes nine major organ systems, each composed of various organs and tissues that work together as a functional unit. The chief constituents and prime functions of each system are summarized below. (1) The integumentary system, composed of the skin and associated structures, protects the body from invasion by harmful microorganisms and chemicals; it also prevents water loss from the body. (2) The musculoskeletal system (also referred to separately as the muscle system and the skeletal system), composed of the skeletal muscles and bones (with about 206 of the latter in adults), moves the body and protectively houses its internal organs. (3) The respiratory system, composed of the breathing passages, lungs, and muscles of respiration, obtains from the air the oxygen necessary for cellular metabolism; it also returns to the air the carbon dioxide that forms as a waste product of such metabolism. (4) The circulatory system, composed of the heart, blood, and blood vessels, circulates a transport fluid throughout the body, providing the cells with a steady supply of oxygen and nutrients and carrying away such waste products as carbon dioxide and toxic nitrogen compounds. (5) The digestive system, composed of the mouth, esophagus, stomach, and intestines, breaks down food into usable substances (nutrients), which are then absorbed from the blood or lymph; this system also eliminates the unusable or excess portion of the food as fecal matter. (6) The excretory system, composed of the kidneys, ureters, urinary bladder, and urethra, removes toxic nitrogen compounds and other wastes from

the blood. (7) The nervous system, composed of the sensory organs, brain, spinal cord, and nerves, transmits, integrates, and analyzes sensory information and carries impulses to effect the appropriate muscular or glandular responses. (8) The endocrine system, composed of the hormone-secreting glands and tissues, provides a chemical communications network for coordinating various body processes. (9) The reproductive system, composed of the male or female sex organs, enables reproduction and thereby ensures the continuation of the species.

Cellular Articles in other Topics:
cytoskeleton cytoskeleton from cytoskeleton division aging process Tissue cell loss and replacement from aging blastema formation animal development from animal development Cell reproduction from reproduction cellular components Cytology from morphology cleavage Early development from animal development cloning clone from clone epidermal differentiation The epidermis from skin fetus growth rate Types and rates of human growth from human development plant growth determination Origin of the primary organs from plant development The contribution of cells and tissues

from plant development regeneration and cell renewal Repair and regeneration from human disease sexual reproduction specialization Sex cells from sex Hormones from sex structural unit of life Life on Earth from life The earliest living systems from life vitamin deficiencies Vitamins from nutritional disease physiology Historical background from physiology aging process human aging from human aging aging from aging Internal environment: consequences of metabolism from aging cellular metabolism Endocrine system from human aging fluid regulation Regulation of water and salt balance from excretion genetic behaviour genetics from genetics hormones Hormone chemistry. from hormone interaction with drugs General principles from drug

metabolism metabolism from metabolism Coarse control from metabolism circulatory system Main features of circulatory systems from circulation human body Organization of the body. from human body metabolic disease metabolic disease from metabolic disease Disorders of porphyrin metabolism from metabolic disease pathology Characteristics of cell and tissue changes from animal disease cancer cancer from cancer ref. [cancer] passim to ref. [cancer20] cell death The "point of no return" from death cryosurgical tissue destruction cryosurgery from cryosurgery growth inhibition Abnormal growth of cells from human disease infection virus from virus radiation damage Radiation injury from human disease Major types of radiation injury from radiation

scientific study cytology cytology from cytology genetic continuity and organization Genetics from zoology morphology of cells The study of structure from biology observations by Braun Braun, Alexander Carl Heinrich from Braun, Alexander Carl Heinrich Claude Claude, Albert from Claude, Albert Goodsir Goodsir, John from Goodsir, John Mohl Mohl, Hugo von from Mohl, Hugo von Mller Müller, Johannes Peter from Mller, Johannes Peter Palade Palade, George E. from Palade, George E. tissue culture examination tissue culture from tissue culture structure and function bacteria ingestion in phagocytosis phagocytosis from phagocytosis difference between animal and plant cells ref. [animal]

fertilization fertilization from fertilization human respiration Peripheral chemoreceptors from respiration, human lipid structural components lipid from lipid

Figure 1: Structure of an information system.

nucleic acid formation nucleic acid from nucleic acid spatial patterns localization Structural and functional development from biological development

Figure 3: A parsing graph.

Information Processing
Query languages

Figure 4: A semantic network representation.

The uses of databases are manifold. They provide a means of retrieving records or parts of records and performing various calculations before displaying the results. The interface by which such manipulations are specified is called the query language. Whereas early query languages were originally so complex that interacting with electronic databases could be done only by specially trained individuals, recent interfaces are more user-friendly, allowing casual users to access database information. The main types of popular query modes are the "menu," the "fill-in-the-blank" technique, and the structured query. Particularly suited for novices, the menu requires a person to choose from several alternatives displayed on the video terminal screen. The fill-in-the-blank technique is one in which the user is prompted to enter key words as search statements. The structured query approach is effective with relational databases. It has a formal, powerful syntax that is in fact a programming language, and it is able to accommodate logical operators. One implementation of this approach, the Structured Query Language (SQL), has the form

Figure 2: Document imaging.

Figure 5: The architecture of a networked select [field Fa, Fb, . . . , Fn] information system.

from [database Da, Db, . . . , Dn] where [field Fa = abc] and [field Fb = def]. Structured query languages support database searching and other operations by using commands such as "find," "delete," "print," "sum," and so forth. The sentencelike structure of an SQL query resembles natural language except that its syntax is limited and fixed. Instead of using an SQL statement, it is possible to represent queries in tabular form. The technique, referred to as query-byexample (or QBE), displays an empty tabular form and expects the searcher to enter the search specifications into appropriate columns. The program then constructs an SQL-type query from the table and executes it. The most flexible query language is of course natural language. The use of natural-language sentences in a constrained form to search databases is allowed by some commercial database management software. These programs parse the syntax of the query; recognize its action words and their synonyms; identify the names of files, records, and fields; and perform the logical operations required. Experimental systems that accept such natural-language queries in spoken voice have been developed; however, the ability to employ unrestricted natural language to query unstructured information will require further advances in machine understanding of natural language, particularly in techniques of representing the semantic and pragmatic context of ideas. The prospect of an intelligent conversation between humans and a large store of digitally encoded knowledge is not imminent. Information searching and retrieval State-of-the-art approaches to retrieving information employ two generic techniques: (1) matching words in the query against the database index (keyword searching) and (2) traversing the database with the aid of hypertext or hypermedia links. Key-word searches can be made either more general or more narrow in scope by means of logical operators (e.g., disjunction and conjunction). Because of the semantic ambiguities involved in free-text indexing, however, the precision of the key-word retrieval technique--that is, the percentage of relevant documents correctly retrieved from a collection--is far from ideal, and various modifications have been introduced to improve it. In one such enhancement, the search output is sorted by degree of relevance, based on a statistical match between the key words in the query and in the document; in another, the program automatically generates a new query using one or more documents considered relevant by the user. Key-word searching has been the dominant approach to text retrieval since the early 1960s; hypertext has so far been largely confined to personal or corporate information-retrieval applications. The exponential growth of the use of computer networks in the 1990s presages significant changes in systems and techniques of information retrieval. In a wide-area information service, a number of which began operating at the beginning of the 1990s on the Internet computer network, a user's personal

computer or terminal (called a client) can search simultaneously a number of databases maintained on heterogeneous computers (called servers). The latter are located at different geographic sites, and their databases contain different data types and often use incompatible data formats. The simultaneous, distributed search is possible because clients and servers agree on a standard document addressing scheme and adopt a common communications protocol that accommodates all the data types and formats used by the servers. Communication with other wide-area services using different protocols is accomplished by routing through so-called gateways capable of protocol translation. The architecture of a typical networked information system is illustrated in Figure 5. Several representative clients are shown: a "dumb" terminal (i.e., one with no internal processor), a personal computer (PC), and Macintosh (trademark; Mac), and NeXT (trademark) machines. They have access to data on the servers sharing a common protocol as well as to data provided by services that require protocol conversion via the gateways. Network news is such a wide-area service, containing hundreds of news groups on a variety of subjects, by which users can read and post messages. Evolving information-retrieval techniques, exemplified by an experimental interface to the NASA space shuttle reference manual, combine natural language, hyperlinks, and key-word searching. Other techniques, seeking higher levels of retrieval precision and effectiveness, are studied by researchers involved with artificial intelligence and neural networks. The next major milestone may be a computer program that traverses the seamless information universe of wide-area electronic networks and continuously filters its contents through profiles of organizational and personal interest: the information robot of the 21st century. Contents of this article: Introduction General considerations Basic concepts Information as a resource and commodity Elements of information processing Acquisition and recording of information in analog form Acquisition and recording of information in digital form Recording media Recording techniques Inventory of recorded information Primary and secondary literature Databases Organization and retrieval of information Description and content analysis of analog-form records Description and content analysis of digital-form information Machine indexing Semantic content analysis Image analysis Speech analysis Storage structures for digital-form information

Query languages Information searching and retrieval Information display Video Print Printers Microfilm and microfiche Voice Dissemination of information Information systems Impact of information technology Analysis and design of information systems Categories of information systems Management-oriented information systems Administration-oriented information systems Service-oriented information systems Computer-integrated manufacturing Transaction-processing systems Expert systems Public information utilities Impact of computer-based information systems on society Effects on the economy Effects on governance and management Effects on the individual Bibliography Concepts of information and information systems Information processing Organizational information systems Public information utilities Impact of information systems Bibliographic sources

Information only adds value to your organization if people can find the content they need, when they need it. Your users need the tools to search, navigate and view mission-critical information—whether it’s stored in a structured database down the hall, on a Web server across the street, or in a word processing document saved on a file server half-way around the world. They need an intuitive solution that can keep up with the increasing amount of information they create and use every day. They need the power of Verity K2 Enterprise.

Connects the Right Users with the Right Content at the Right Time The most accurate, scalable infrastructure available to power corporate portals, Verity K2 Enterprise gives your users the tools they need to turn information overload into competitive advantage. K2 Enterprise delivers rapid, relevant information retrieval with Verity’s advanced search, while its Intelligent Classification features let you organize information the way you organize your

business. This lets your users navigate directly to the information they need through K2 Enterprise’s advanced user interfaces. Behind your users’ browsers, K2 Enterprise’s open design ensures rapid integration with your existing e-business environment, while its scalable architecture gives your portal unlimited growth and reliable fault-tolerance. Regardless of how many documents are being searched or how many users are searching them, K2 Enterprise scales linearly with zero performance degradation. And its global support extends your portal to 24 languages and provides the flexibility to distribute content administration to the local offices that created and know it best. Advanced Search If your users can’t find information, they can’t act on it. That’s why the advanced Verity search, navigation, and viewing technologies that K2 Enterprise incorporates are so important to the success of your business. Using the robust Verity Query Language, you can implement these transparently to put the power of sophisticated queries behind simple, one-word searches. Novice users can get accurate results without using complex query syntax or understanding your corporate taxonomy. Features like smart correction of user errors, stemming expansion, query-by-example and automatic summarization guide your users to the information they need—even if they misspell search terms or don’t know where to start looking.

Intelligent Classification Portals powered by Verity K2 Enterprise can do more than search and retrieve specific information for your users. They can automatically organize your information assets to make them easier for your users to browse. Unlike automatic classification methods that rely solely on statistical clustering algorithms to group documents, Intelligent Classification combines machine efficiency with human intellect. Subject matter experts can refine the rules created by computers to apply business logic that can only be understood by the human mind.

Advanced User Interfaces Effective portal solutions make information as easy to find and retrieve for novice users as it is for experts familiar with advanced search techniques and corporate taxonomies. Verity K2 Enterprise provides your users with advanced user interfaces that make both unstructured and structured information assets readily accessible. For example, you can create directories based on your corporate terminology through which users can navigate and restrict searches to find unstructured content. Or you can utilize K2 Enterprise’s parametric search to let users sort, filter and drill through structured information.

Rapid E-business Integration

Verity K2 Enterprise is designed for rapid integration into existing e-business environments. Its straightforward integration leverages your current investments, minimizing implementation costs and ensuring project success. The key is Verity K2 Architecture, which supports technologies such as COM and Java, and includes a flexible API that provides access to all of its advanced features. K2 Enterprise also supports the widest range of information and repositories of any portal solution on the market. These include HTML, XML, multibyte data, Web and file systems and ODBC compliant databases.

Unlimited Growth Verity K2 Enterprise’s distributed architecture powers your portal with unlimited growth potential. By brokering searches, you can increase both the amount of information being searched and the number of users submitting queries— without any degradation in performance.

Fault Tolerant Operation Verity K2 Enterprise’s brokered search ensures your site will always be up and running by routing queries to servers that are best suited to the task. This distributes load evenly, ensuring that response time never suffers because one server is sitting idle while others are overloaded and isolating hardware failures to deliver uninterrupted service to your users enterprise-wide—24 hours a day, seven days a week.

Global Support Verity K2 Enterprise is the only portal infrastructure that supports true enterprise-wide and global scale solutions. Features include multiple language capabilities and built-in flexibility that allows administration to be distributed across different geographic locations. Multiple Language Support—All of K2 Enterprise’s components support multibyte character sets, which allows you to index, classify, search and view information in 24 Asian and European languages. By partnering with leading vendors like IBM, Inxight and Basis Technologies, Verity provides best-of-breed language locales to guarantee that K2 Enterprise always delivers the most advanced stemming, tokenization and concept extraction available. Flexible, Distributed Administration—By allowing you to distribute administration functions across geographic locations, K2 Enterprise puts administration of content in the hands of the groups that created and understand it best. Content can be administered on local servers, yet remain searchable enterprise-wide. Queries are transparently brokered to each local server, returning relevant results from across your enterprise with the performance of a single search engine.

The key to success in e-commerce is turning browsers into buyers—faster and more efficiently than your competitors can. Verity® K2 Catalog gives your ecommerce portal the power to do just that. By intuitively matching the right products to the right people, Verity Catalog dramatically increases sales and creates loyal customers who keep coming back for more.

The most effective, scalable infrastructure available to power e-commerce portals, Verity K2 Catalog ensures that your customers find exactly what they’re looking for—and more. Besides providing advanced Verity search that makes finding products on your site quick and easy, Verity K2 Catalog’s Intelligent Merchandising lets you influence purchasing decisions by suggesting related products, up-selling and promoting specific merchandise— adding profitable site stickiness to your online store. Adaptive personalization features take this a step further by tailoring the online shopping experience based on customer browsing patterns. Behind the shelves of your e-store, Verity K2 Catalog’s open design ensures rapid integration with your existing e-business environment. And its scalable architecture gives your e-commerce solution the capability to accommodate unlimited growth of both your catalog and customers with zero performance degradation. This means your customers can fill their shopping carts fuller and faster with Verity K2 Catalog—24 hours a day, seven days a week. Intelligent Merchandising E-commerce portals powered by Verity Catalog can do more than retrieve specific products for customers. They can influence purchasing decisions through sophisticated online merchandising techniques that increase sales and recognize more revenue. Verity Catalog’s Intelligent Merchandising leverages Verity’s Intelligent Classification technology to create online aisles through which you can guide your customers directly to the products you want to sell them. Or you can employ it to build business-rules that promote overstocked products, recommend items that complement the ones your customers are looking for, or suggest substitutes for out-of-stock merchandise.

Profitable Site Stickiness Site stickiness isn’t just about keeping customers on your site longer. It’s about keeping them longer because they’re spending more money. Verity K2 Catalog profitably increases your site’s "stickiness" with intuitive, accurate search. This is one of the key advantages of portals powered by Verity—because if your customers can’t find what they’re looking for with a few clicks of their mouse,

you’ll lose them to a site where they can. Rapid E-Business Integration Verity K2 Catalog is designed to fit within existing e-business environments. Its rapid integration leverages your current investments by minimizing implementation costs and decreasing time-to-market. In addition, only Verity K2 Catalog gives administrators the control and flexibility necessary to deliver the organized, relevant information customers need to make quick, informed purchasing decisions without costly administrative overhead or expensive content repurposing. Adaptive Personalization Instead of relying on static user profiles, Verity personalizes the online shopping experience by dynamically adapting to each search based on past queries and customer preferences. Specific products can be promoted based on previous purchasing history to provide the right match between products and customers— whether they’re shopping for themselves or someone else. Unlimited Growth Verity K2 Catalog’s scalability, fault-tolerance and wide range of supported data are the foundation of a solid e-commerce portal. This means your customers can rely on you to sell them the products they want, when they want them—no matter how many people are shopping at your site. And you can grow your ecommerce business one customer—or one million customers—at a time. Scalability—Expand your catalog and handle more queries as your customer base grows, without any degradation in performance. Fault-Tolerant Operation—Verity K2 Catalog’s brokered search ensures your site will always be up and running by routing queries to servers that are best suited to the task. This distributes load evenly, ensuring that response time never suffers because one server is sitting idle while others are overloaded and isolating hardware failures to deliver uninterrupted service to your customers— 24 hours a day, seven days a week. Structured and Unstructured Information—Verity K2 Catalog supports the widest range of both structured and unstructured information and repositories of any portal solution on the market: HTML, XML, multibyte data, Web and file systems and ODBC databases. Multiple Language Support—Optional Verity Locales give K2 Catalog the power to sell your products in 24 Asian and European languages by recognizing, filtering, indexing and searching selected international character sets. SIM is ideally suited to web site content management, especially for web sites that have a need for;

• • • • • • • • •

Management of structured documents, Large data volumes (up to millions of documents), Web based workflow and release control, including the ability to preview changes and additions in place in the web site, Tightly integrated searching and table of contents support, Media asset management, where multimedia objects are Dublin Core metadata cataloged and managed as a collected resource for the site. Dynamic presentation of documents which allows for customization based on user needs, Hypertext link creation and multimedia object embedding that is implemented in a completely word-processing package independent manner, greatly reducing integration costs for new editing packages, Hypertext link management that tracks all links, allowing change impact analysis and easy "what points at me?" checking, A choice of editing packages and approaches including MS Word, XML editors, SGML editors, HTML fill-in form support, and Direct XML editing through a fill-in form (for administrators!)

Public reference sites To see the output of this web management system, visit the Textile Clothing and Footwear Australia site at TCFOZ This site is maintained by non-technical content editors, who create content using Xmetal. Another web site running with SIM Web site content management is Standards Australia , who wrote extensions to the SIM system in the ACE programming language to meet their particular needs. Standards Australia use MS Word as their editing package, using the SIM RTF->XML translator to convert and manage those documents in XML format. Key Characteristics Web Server: SIM Web server – multithreaded server – ACE used for application logic. Platforms: Windows NT, Solaris Code Base: ACE (SIM scripting language – object oriented java-like language with SGML/XML support). User Interface: All user interfaces are provided with a standard web browser. Editing package is configurable. Database Used: SIM Content Management Server – text retrieval database with SGML/XML native support. ODBC support is included in ACE, so content from other sources can be integrated. Authentication Mechanism: The Web content management system currently maintains its own internal user database, but is being extended to support LDAP

lookup for user authentication. StyleSheet mechanism: The ACE language is ideally suited to XML->HTML conversion processing, as it is integrated with SGML and XML parsers (such as EXPAT, and sgmlp). The Web content management system does not currently support XSLT for stylesheeting, but the SIM group does have an XSLT engine in beta test, and it will be added as a supplementary mechanism in the future. One of the advantage of using ACE for stylesheeting is that it has powerful text manipulation features as well as XML/SGML support. Workflow Support: Simple workflow support and release control is included. Documents can exist in a number of states including draft, pending review, released, suspended and deleted. Documents can be previewed on the web while in any status other than released – released documents are visible to other users. For complex workflow support, the SIM DMS (Document Management System) is available. This application supports complex Workflow management coalition standard workflow, with a web user interface. The SIM DMS product is separately licensed, and is still currently in Beta release. SIM Documentation Management Solution One of the keys to successful electronic delivery of technical documentation is the ability to re-use content, that is, deliver content in a number of different ways from a single source. This allows the same document and document components to be used over and over again. Re-use guarantees consistency : every user sees the same, correct version of a document. Re-use means efficiency : a document is written once only. Re-use allows for refinement : a document can be developed over time. It also allows, for example, different customized views of the same source documentation to be delivered to different classes of users; similarly, it allows the same source documentation to be delivered in multiple formats. A Documentation Management Scenario Consider, for example, a company that is producing a set of technical documents that are to be delivered to a number of different clients. Internet based delivery of the documents is one of the requirements; as a consequence, changes to any document will be immediately provided to customers via the web. Although the content to be delivered to each client may be substantially the same, there will typically be some differences. These differences may result from variations in the products the technical documents are describing. Also, the clients may wish to add annotations to their documentation, to reflect, for example, field knowledge obtained in using the manuals to repair various problems. In these cases, the annotations may represent valuable intellectual property of each of the clients and customers will require that access to them be restricted to their own personnel. Thus the document repository to be delivered to the clients will generally consists of a core of common content, with additional content that is private to specific clients. Documentation Components

Managing database content is more than just storing the raw text of documents and their accompanying figures. Documents can have internal structure, and there can be an external structure relating separate documents. For example, documents are often interlinked in a number of ways and these links are essential parts of the document content. When searching for documents, users often scan indexes to browse the terms contained in the document repository; these terms constitute the vocabulary of the document collection. Sophisticated users may also require to know the frequency of each of these terms in the document collection when conducting searches, in order to produce more effective queries. Documents can also have associated metadata that provides information about the document, such as author, or status, or security level. Metadata, too, can be used to drive more productive searches. Customized Delivery and Effectivity It is essential that the electronic publishing system deliver the correct document content, links and vocabulary to each class of users accessing the system. The need to provide an accurate snapshot of the database contents (i.e. text, figures, links and vocabulary and term frequencies) for each particular class of users is referred to as effectivity . Efficient provision of effectivity requires very sophisticated text database support. Automatic Tables of Content Another requirement for technical documentation include the ability to dynamically produce tables of content (TOCs) for each document from the XML document structure and content. Technical documents are often long, so that when viewing a fragment of a document, it is important to understand the location of that fragment in the context of the whole document. This can be achieved by displaying the TOC along with a document fragment, when the fragment is displayed. Since the documents change over time, it is necessary to generate these TOCs dynamically when the document is viewed. Dynamic Update Technical documentation can involve very large document collections, which must be updated dynamically. This means that the delivery systems must provide a scalable solution, one that is able to update and deliver content efficiently for fast growing document collections. Key Points In summary some of the key requirements for a technical document delivery system include: • • The ability to repurpose content; for example, support multiple delivery formats from a single source, Manage all components of documentation, including content, images,

• • • • •

internal structure, links, vocabulary and metadata, Support effectivity, namely deliver database snapshot appropriate to each class of users, Provide dynamic tables of content (TOC) from the XML document structure and content, Update and deliver documents quickly and efficiently, Provide powerful navigation searching and viewing, and Provide scalable solutions.

SIM Legislation Management Solution
The Nature of Legislation The law is both complex and comprehensive. Not surprisingly, legislation databases are examples of large, very structured text collections. For example a single Act of Parliament which might be broken into many tens or hundreds of numbered sections, which in turn are broken into numbered subsections or paragraphs or sub-paragraphs. In large Acts these sections are grouped into chapters, parts, divisions and/or subdivisions, each with a label, and usually a section or title. A formal system of reference (or citation) allows each component of the database to be identified clearly and unambiguously. Amendments to Legislation An important characteristic of legislation is that it changes over time. Sections or even larger units can be added, removed or altered. New law may be handed down to become legislation, creating a new principal Act where no Act previously existed. Existing legislation may undergo a complete restructuring, creating a new Act or Acts, replacing those previously in place. In between such creation and replacement, amending Acts can specify alterations to the principal Acts, perhaps changing the wording of one or two sections, or replacing complete sections, or even removing or inserting whole parts or chapters. Legislation's Temporal Nature Although only the principal Acts and the amending Acts have legal force, lawyers and legal researchers need access to the law as it existed during the time period relevant to their particular problem. From time to time, authorized Government bodies issue consolidations of particular Acts. A consolidation represents current law, presenting the principal Act as modified by the relevant amending Acts; that is, with all additions, deletions, and changes to wording applied, and with all new components inserted. However, lawyers are often interested in the state of the law at times other than those for which officially released consolidations are available. Ideally, they would like to access consolidations of the law at arbitrary points in time . Representing Structure with XML

The use of XML solves the problem of how to represent the structured text inherent in legislation. XML defines an abstract grammar for representation and exchange of text with tags interspersed throughout the text. A DTD (Document Type Definition) is a particular XML grammar describing which document components are valid and what sub-components they can contain. Acts from a given jurisdiction can be stored in XML in a format satisfying a particular DTD (which would state that every Act must contain sections and each section must contain text, or two or more subsections, and so on). One would then describe how to display a particular Act that satisfied the DTD by describing the presentation in terms of the DTD. A number of different presentation schemes can be described for a single DTD so that one might specify a presentation which only displays the table-of-contents to a specified depth, as well as a presentation for the whole Act. This is one of the advantages most often cited for using XML: the ability to reuse the same information for multiple purposes. Long-term Availability For information like legislation that continually changes over time, XML provides a safe format for the archiving of documents. Utilities such as word processors often use proprietary formats and are unable to read legacy documents, even those authored by a pervious version of same word processor. These problems do not exist if XML is used, because only the content and structure of documents are represented by XML; the presentation of documents is treated separately. An End-to end Solution Because the structure and content of the legislation is available to the application, in a form separate from presentation information, it is possible to develop powerful end-to-end solutions, not easily achievable if proprietary data representation standards are used. Using the Structured Information Manager, a legislation drafting and access system called EnAct was developed for the State Government of Tasmania in Australia. Enact solves the second problem listed above for legislation databases: the ability to search legislation databases at an arbitrary point in time and view the correct consolidation of an Act at that point of time. Note that accessing legislation databases does not only involve viewing text. Legislation databases consist of a large number of interrelated documents linked together by hyperlinks. Viewing a consolidation of legislation at a particular point in time involves retrieving the correct text as well as the correct hyperlinks at that point in time. SIM, XML, and Legislation: an Ideal Partnership The EnAct system exemplifies the direction that legislation databases will develop in the future, namely providing accesses to the correct state of the law at any point in time. EnAct is able to achieve this goal because the legislation is maintained in XML, allowing access to the structure and content of data, and because the SIM document management system, used for the development of EnAct, efficiently performs the operations on XML content required to achieve automatic consolidations.

SIM Intelligence Applications
In intelligence applications, it is normal to build and maintain an information repository fed from a number of sources and then conduct searches in order to locate relevant information. Such information repositories are in use in both military and commercial applications. Where the information is highly structured, conventional database management systems are used to maintain these data warehouses. Where the information consists of text and metadata, systems with advanced text database capabilities are required. Large Scale, Dynamic Applications In these applications, the information repository can range from a few gigabytes in size to hundreds of gigabytes or more. The repository may be static or, more typically, continually growing. For example, in the case of a news feed more than one gigabyte of new data can arrive over the course of every day. Other application areas may need to handle even greater dataflow. Some applications also need to migrate non-current data for archiving. For all large-scale high-load intelligence applications, high performance hardware/software architectures, such as multiprocessor Unix workstations, have to be deployed. Building Information Repositories The most important task when building an intelligence application is building and maintaining the information repository. When a new document is inserted into the repository, every word in the document must be extracted and indexed. This is a very expensive operation as a document may contain several thousand words. And, as noted, the amount of information to process can be very large indeed. SIM has been optimized for just such high volume environments, handling the update process as efficiently as is possible. Another problem is that new documents may be arriving at the same time as the database is in use for searching. Although many existing text database systems support fast batch loading of data as an overnight operation when the database is off-line, they do not allow updates of the repository during the day when the database is in use. However, for any organization that requires up-to-date access to the most recent data, or access to its intelligence 24 hours a day, seven days a week, this is not acceptable. SIM has been specifically designed to support concurrent updates and queries, thereby providing 24 hour access to up-to-date information. Searching Information Repositories The reason for building an information repository is to provide access to the data it contains. Since the document collection can be very large, advanced search techniques are needed to locate desired information. SIM has been developed to

support just such sophisticated searching. Queries can use Boolean logic, word position information (such as "same sentence", "same paragraph", "within n words"), document structure, and ranked relevance queries (where the documents are returned in order of relevance to the query) to locate target data. Each query type can combined as required. For example, to achieve high accuracy when querying a collection, a searcher could combine a Boolean query with a ranked query to identify a subset of the collection that can then be ranked against a set of ranking terms. Fuzzy matching is also important: for example, it can be common to have several alternative spellings (or misspellings) of a word. SIM provides support for fuzzy matching by computing a distance measure between two terms, so that the presence of alternate spellings need not frustrate the user's task. Repository Management To maintain large, high-performance information repositories, the quality of a system's database administration capabilities are of the utmost importance. For very large repositories, it can be desirable to split the data collection over multiple databases. SIM has the ability to do just that, while retaining the ability to search each database in parallel. With critical information collections, it is necessary to be able to back up repositories efficiently and robustly, and to be able to monitor and refine database performance. SIM provides administration utilities that are of the very highest quality and reliability, and that deliver the finest level of control. A Proven Track Record SIM provides an advanced, extremely rich, extremely reliable set of capabilities that support high-performance, secure intelligence applications. SIM has been successfully adopted by the Departments of Defense of both Australia and the U.S.A. for managing and searching large repositories of information.

SIM Knowledge Base Solution
Whether in the form of a human service or embodied as a physical product, ultimately, knowledge is every corporation's stock in trade. The pooled knowledge of an enterprise is its fundamental capital, its true wealth. Knowledge management is about leveraging corporate knowledge: identifying it where ever it may be found, storing it for re-use, and delivering it to where it is needed. SIM's advanced content management enables organizations to do just that. By focusing on content, SIM transforms opaque, non-functional documents into richly structured information sources. SIM's support of sophisticated content, structure, time, and metadata querying opens up the organizational knowledge base. And SIM's high-performance database management and web delivery enable it to deliver the right information to right people at the right time.

A Simple Model Consider an organization that builds two databases over time and matches the documents from one against the contents of the other. One database represents knowledge, the other needs. This simple model of a knowledge base is applicable to many practical situations. Sample Applications For example, the human resources department of an organization might have one database of tasks needing to be performed, and another describing the qualifications and expertise of current staff. The department wishes to assign the most appropriate employee to each task. In order to make the assignments, it is necessary to match the tasks against the expertise database. In order to determine where an employee may best be deployed, the complementary action of matching of the expertise of the employee against the database of tasks can be performed. Similar requirements exist in an employment agency, in realestate management, and in other information gathering and analysis applications. A Detailed Example Another example that fits this model is the administration of grant applications by a research body that is responsible for determining which grant applications should receive funding. A panel of experts has overall responsibility for recommending applications for funding. In order to do so, it is necessary to assess each application; accordingly, the panel must assign each application to an appropriate expert assessor. There are many types of interaction with such a system. Grant applications are submitted by applicants or by their organizations. Assessors, possibly from all over the world, must submit their reports and update their personal details. Members of the panel require full access to information about applications and assessors. A team of administrators may need access for general system maintenance or to generate reports. Matching Information against Needs There will typically be tens of thousands of assessors and applications covering a very wide range of research areas. In line with our simple model, a database of the submitted grant applications and a database of the assessors who may be approached to review applications must be built. A difficult task facing the panel of experts is choosing the appropriate assessors to review an application. SIM can use advanced relevance matching to help with this problem. In this approach, the text of an application is matched against the expertise of the potential assessors. Assessors who describe their expertise in terms similar to those used in an application are likely to be appropriate reviewers. With a single relevance query, an application can be matched against the complete database of assessors, and a ranked list of the closest matching assessors returned. The stronger the correlation between assessor and application, the higher the assessor is ranked. The panel of experts can then examine this ranked list and allocate assessors appropriately.

Knowledge Base Requirements To develop such a knowledge base, the three important requirements are: A sophisticated content management system with advanced information retrieval and relevance matching capabilities, Web-based access to accommodate users that will be geographically dispersed throughout the world and High performance, including the ability to handle large volumes of data, and the ability to cope with heavy, peak interactive loads. SIM and Knowledge Base SIM technology has been successfully deployed to build knowledge databases. Our experience has been that the provision of web-based access has meant that the application is readily available to users, and the use of relevance matching between databases has led to significantly improved decision making within the organization. SIM, the Structured Information Manager, delivers the enabling technology for the key components of knowledge management: storing knowledge, ensuring that it can be located, and delivering it to where it's needed, when it's needed. SIM Metadata Repository Management Solution. SIM MetaSite is a comprehensive solution for the collection, validation, classification and searching of metadata. SIM MetaSite forms a metadata repository which describes a distributed collection of resources, and provides a powerful browsing and searching interface to that repository. Both a simple and advanced Web searching interface are provided with the SIM MetaSite product, to satisfy the needs of the general public or specialized users. Also, two lower level system interfaces are provided to the repository (one using http and one using Z39.50) to allow integration of the SIM MetaSite product into existing environments. Metadata Repository SIM MetaSite stores and manages a database of resource metadata. The resource metadata is stored in a standard XML format (RDF). The Metadata repository is managed dynamically, and can be updated while users are querying the system. A web interface is provided for management of batch operations, and interactive updates, deletes and insertions. The Metadata repository is searchable and brows able on all Dublin Core fields. The Metadata repository can also support non-Dublin Core fields in a dynamic manner – allowing the system to change or evolve as standards change – without programmer intervention. SIM MetaSite can handle metadata databases of very large size (>20 Gbytes). The Metadata repository includes fields for tracking of popularity/usage of metadata records. This information can be used to improve the visibility of

metadata resources that are visited most often. In addition to searching on metadata, the Metadata repository is capable of containing the full text of resources, where searching of full text in combination with metadata fields is required. MetaSite User Interface SIM MetaSite is supplied with a user interface which allows metadata and full text searching along with thesaurus browsing, all in an easy to use HTML interface. The interface allows frames or no-frames, java script or no-java script operation. The provided user interface is highly configurable, which allows the MetaSite interface to evolve along with changing or clarified user requirements. The interface is stateless, yet allows user customization of the operation of the interface (for example in selection of the thesaurus to be used). Creation and Loading of Metadata The MetaSite crawler collects resources from the web in order to build the metadata repository. Its operation can be controlled in many ways; • • • • • by regular expression for included URLs, by regular expression for excluded URLs, by Mime type of data to be collected, by number of steps that can be followed (depth) from the starting configuration file URL, and by number of steps that can be followed off-site from a valid on-site URL.

The crawler is multi-threaded, and can crawl multiple sites simultaneously. Delays between requests can be configured in order to reduce load on harvested sites. The crawler conforms to the ROBOTS.TXT standard for inclusion and exclusion. The crawler understands the RDF standard, and can follow links expressed in the RDF standard. Because the crawler is open and configurable, new data types and document types can be supported. Where data is located locally, the crawler will not duplicate such data, but will record references to the local data, thus allowing the repository to be populated without data duplication. The MetaSite crawler also includes a configurable validation program. This program is usually run after the data has been collected, in a batch mode. The validation program checks the RDF data for validity, and can perform operations such as setting default values for metadata fields according to configurable rules – i.e. automatic generation of metadata, checking keyword entries against a central thesaurus, detecting duplicate data, and translating from META tags in HTML documents into RDF expressed in XML format. Thesaurus Support

MetaSite allows the management of multiple thesaurus databases. These are used for validation of metadata records, and for browsing and searching within the user interface. Where multiple Thesauri are loaded, users can dynamically choose which thesaurus to use, depending on their preferences. Thesaurus access is tightly integrated into the user interface, and helps significantly in targeting user queries, and helps in giving the user a sense of the overall content of the metadata repository. When the user browses through the thesaurus, the number of records within the metadata repository that correspond with each category in the thesaurus are displayed. When the user conducts a search the search results for each thesaurus category are shown. Searching accuracy is also enhanced by using the Thesaurus to expand the user's query to include synonyms for the user's query terms. The query terms that were included are displayed to the user, and the user can choose to disable this functionality if they wish. Thesauri are also used for validation of records during the loading process, for example for checking that restricted vocabularies are adhered to. This can also be used to map large vocabularies down to restricted vocabularies using the "alternate" field in the thesaurus. Thesaurus entries are stored in a standard XML format, making it easy to export and import new Thesauri. The thesaurus records are completely dynamically maintainable on-line – through a administration web interface. The thesaurus databases themselves are fully accessible via Z39.50, and the schema used for that access is configurable for the particular site requirements. Open and InterOperable A low-level http interface is provided to SIM MetaSite, which allows embedding of the MetaSite functionality into other web interfaces. The low level API allows access to most of the searching and presentation functionality of SIM MetaSite. SIM MetaSite is also fully accessible via Z39.50, and the schemas used for that access can be configured for the particular site requirements – indeed multiple Z39.50 Schemas can be used simultaneously for MetaSite access.

SIM Web site content management
SIM is ideally suited to web site content management, especially for web sites that have a need for; • • • • • • Management of structured documents, Large data volumes (up to millions of documents), Web based workflow and release control, including the ability to preview changes and additions in place in the web site, Tightly integrated searching and table of contents support, Media asset management, where multimedia objects are Dublin Core metadata cataloged and managed as a collected resource for the site. Dynamic presentation of documents which allows for customization based

• • •

on user needs, Hypertext link creation and multimedia object embedding that is implemented in a completely word-processing package independent manner, greatly reducing integration costs for new editing packages, Hypertext link management that tracks all links, allowing change impact analysis and easy "what points at me?" checking, A choice of editing packages and approaches including MS Word, XML editors, SGML editors, HTML fill-in form support, and Direct XML editing through a fill-in form (for administrators!)