NOISE TOLERANT OPTICAL CHARACTER RECOGNITION SYSTEM CROSS-REFERENCE TO RELATED APPLICATIONS This application is related to application Sex. No. 07/875,000 which is continuation of 07/599,522 of Dan S. Johnson for Noise Tolerant Optical Character Rec- ognition System, fled Oct. 17, 1990; application Se:. No. 07/705.838 of Oscar A. Zuniga for Automatic Sep- aration of Text from Background in Scanned Imag of Complex Documents, filed May 28, 1991, now pending; and application Ser. No. 07/898,392 now U.S. Pat. No. 5,l79,599, of Lynn J. Forrnanek for Dynamic Threshol- dino System for Documents Using Structural Informa- tion of the Documents, filed lun. 17, l99l; all owned by the same entity. FIELD OF THE INVENTION This invention relates to pattem recognition systems and more particularly to computerized pattern recogni- tion systems. Even more particularly, the invention relates to computerized optical character recognition systems. BACKGROUND OF THE INVENTION Optical character recognition, or OCR, is the process of transforming a graphical bit image of a page of tex- tual information into a text tlle wherein the text infor- mation is stored in a common computer processable format. such as ASCII. The text file can then be edited using standard word processing software. In the process of transforming each of the characters on the page from a graphical image into an ASCII for- mat character. prior art OCR methods first break the graphical page image into a series of graphical images. one for each character found on the page. They then extract high level features of each character and classify the character based on those features. If the characters on the page are of a high quality, such as an original typed page, these simple processing methods will work well for the process of converting the characters, How- ever, as document quality degrades, such as through multiple generations of photocopies, carbon copies, facsimile transmission, or in other ways, the characters on a page become distorted causing simple processing methods to make errors, For example, a dark photo- copy may join two characters together, causing diffi- culty in separating these characters for the OCR pro- cessing. Joined characters can easily cause the process that segments characters to fail, since any method which depends on a "gap" between characters cannot easily distinguish characters that are joined. Light photocopies produce the opposite effect. Char- acters can become broken, and appear as two charac- ters, such as the character "u" being broken in the bot- tom middle to create two characters, each of which may look like the "i" character. Also, characters such as the letter "e" may have a segment broken to cause them to resemble the character "e". Early prior art OCR methods did not extract charac- ter features from a character, instead they simply com- pared a graphical bit map of the character to a template bit map of a knoum character. This method was com- monly called "matrix matching". One problem with matrix matching is that it is very sensitive to small changes in character size, skew, shape, etc. Also, this technology was not "omni font", that is, it had to be carefully trained on each type font to be read and would not generalize easily to new type fonts. To solve the "omni font" problem, prior art methods begin to extract higher level features from a character image. The goal was to select a set of features which would be insensitive to unimportant differences, such as size, skew, presence of serifs, etc., while still being sensi- tive to the important differences that distinguish be- tween different types of characters. High level features, however, can be very sensitive to certain forms of char- acter distortion. For example, many feature extractors detect the presence of "closures", such as in the letters "e", "o", "b", "d", etc., and the feature extractors use this information to classify the character. Unfortu- nately, a simple break in a character can easily cause a closure to disappear, and the feature extractor method that depends on such closur would probably classify the character incorrectly. Often the high level feature representation of a char- acter contains very few features. Therefore, when a feature is destroyed, such as a break in a closure, there is insullicient information left to correctly classify the character. There is need in the art then for an optical character recognition system that classifies characters by creating a set of features that is insensitive to character segmen- tation boundaries. There is further need in the art for such a system that creates features having a low enough level to be insensitive to common noise distonions. Another need in the art is for such a system that creates a suflicient number of features that some will remain to allow character classification even if others are de- stroyed by noise. A still further need in the an is for such a system that provides a set of features that are insensitive to font variations. The present invention meets these needs. A description of other aspects of OCR can be found in the following applications; (a) Application Ser. No. 07/599,522 of Dan S. John- son for Noise Tolerant Optical Character Recognition System, filed Oct. l7, 1990; (1:) Application Ser. No. 07/705.338 of Oscar A. Zuniga for Automatic Separation of Text from Back- ground in Scanned lmages of Complex Documents, filed May 28, 1991; and (o) Application Ser. No. 07/898,392, of Lynn J. For- manek for Dynamic Thresholding System for Docu- mcnts Using Structural Information of the Documents, Eled Jun. 17, l99l; each of which is specifically incorporated herein by reference for all that is disclosed therein. SUMMARY OF THE INVENTION It is an aspect of the present invention to provide a system for recognizing textual characters from a bit image of a page of text. It is another aspect of the invention to define a set of features for each of the characters on the page of text. Another aspect is to define such a set of features that are at a low enough level that they are insensitive to common noise distortions. Yet another aspect is to detine such a set of features for each character within a word so that if a few fea- tures are destroyed by noise, the remaining features will still be suflicient to yield a correct classification. A further aspect of the invention is to provide a set of features that are at a high enough level that they are insensitive to font variations, such as size, shape, skew, etc. The above and other objects of the invention are accomplished in a method of optical character recogni- tion that first segments a page image into character images A set of features is extracted by traversing the outlines of the dark regions in a character image, keep- ing the dark area to r.he left, to identify small sections called features. Once extracted, the features of the un- known character are compared to fi tures of a proto- type character from a template in order to classify the unknown character, and convert it into a character code. The features of the prototype character are called proto-features. ~I`he comparison of the features and the proto-features is perfumed by selecting the features from a character image to be analyzed. Next, one of the templates is selected and each of the features from the character image is compared to each of the proto-features in the template to create an average feature match evidence. Each of the proto-features is then compared to each of the features to create an average proto match evidence. These two evidences are then summed and divided by the total feature and proto-feature lengths to create a match rating. The features are then compared to all other templates to create a match rating list which is sorted into descending order. The top match rating is selected for output, and any ratings that are very close to the top rating are also output to allow a dictionary look-up routine or a lexical analyzer to make the final selection. To create the match evidences, the angles and lengths of the features and proto-features are compared and then the result is normalized to a specific range of val- ues. The above and other objects, features, and advan- tages of the invention will be better understood by read- ing the following more particular description of the invention, presented in conjunction with the following drawings, wherein: FIG. I shows an example of character distortions that commonly occur because of noise and illustrates the problems solved by the present invention; FIG. 2 shows a set of template proto-features for the letters "o" and "I"; FIG. 3 shows a diagram of a proto-feature created for a template character; FIG. 4 shows an example set of features that could be extracted from analyzing the letters "o" and "I"; FIG. 5 shows a diagram of a feature extracted from a character; FIG. 6 shows a block diagram of the hardware of the present invention. FIG. 7 shows a flow diagram of the overall process of the present invention; FIG. 8 shows a flowchart of the extract features process of FIG. 7; and FIG. 9-12 shows a flowchart of the classify character process of FIG. 7. DESCRIPTION OF THE PREFERRED EMBODIMENT The following description is of the best presently contemplated mode of carrying out the present inven- tion. 'l~his description is not to be taken in a limiting sense but is made merely for the purpose of describing the general principles of the invention. The scope of the invention should be determined by referencing the ap- pended claims. Optical character recognition, or OCR, is a process that transforms an image of a page of textual informa- tion into a text file on a computer. The text Ele can then be edited using standard word processors on the com- puter system. The process tirst involves "training" the OCR machine to segment characters within a page image, extract character features and build a set of tem- plates for each class of characters For example, a class for the character "a" might include a template for each font the OCK machine is capable of recognizing. After the templates have been created, a page of tmknown textual information is scanned, the characters are seg- mented, features from each of the characters are ex- tracted and th these features are compared to the templates created earlier in order to classify the charac- ters. The training process is usually performed by the designers of the OCR machine. lts purpose is to "teach" the machine what the "shapes" of the different charac- ters look like. When several character templates match the incoming character fairly well, the character classi- fier can output several choices for the character. Other processes, such as dictionaries and lexical rules, can be used to decide between the choices for a particular character. Therefore, it is important for the character clasitier within an OCR machine to be able to pass multiple choices when it is unsure ofa character classifi- cation. On high-quality documents, simple algorithms will work well for segmentation, feature extraction. and character classification. However, as document quality degrades, the characters on the page become distorted and simple algorithms begin to make errors. Causes of document quality degradation include multiple genera- tion photocopies, small point sizes, carbon copies, fax, dot matrix printers, etc. FIG. I shows an example of character distortions that commonly occur because of noise and illustrates the problems solved by the present invention. Referring now to FIG. I, the characters enclosed by the dotted outline 102 are the characters "r" and "i" which have been `joined" because of noise, as might occur for ex- ample, by a dark photocopy. The character within the dotted outline 104 is the character "u" which has been broken because of noise such as might be caused by a light photocopy. A light photocopy of the character "e" could also result in the outline enclosed in the dot- ted line 106. This type of noise distortion might cause prior art methods to =~\ ify this character ns a *`c". To solve the character classification problems de- fined by the characters of FIG. I, the present invention uses a new method of optical character recognition that first segments a page image into character images. Char- acter image separation is well known in the art, and will not be further described here. The present invention obtains a set of features by extracting the outlines ol` the "black" regions in a character image, and then further djssecting each outline into small sections called fea- tures. Early OCR methods did not use a feature extractor. They simply compared the bit map image of the charac- ter against the template bitmaps of known characters. This method was commonly called "matrix-matching". The problem with matrix-matching is that it is too sensitive to small changes in character size, skew, shape, etc. The technology was not "omni-font". That is, it had to be carefully trained on each font to be read and would not generalize well to new fonts. To solve the omni-font problem, designers of OCR machines began to extract higher lcvcl features from the character im- age. Some example high level features are !.he "clo~ sures'* such as in "e*', "a*', "b", "d", etc., or "bays" such as in `*n", `*u", "rn", etc. The goal was to select a set of features which would be insensitive to unimportant diflerences such as size, skew, presence of serifs, etc., while still being sensitive to the important differences that distinguish between characters of different fonts. The problem with high level features is that they can be very sensitive to certain forms of character distor- tion. For example, a simple break in a character, such as the "e" 106 of FIG. l, can easily cause a closure to disappear. Any method that depended heavily on clo- sures would probably classify this character as a "c". The solution of the present invention is to use features which are at a lower level than closures, bays, etc., but at a higher level than bit maps. ~I`he invention uses very small features, features which approximate the size of the smallest possible outline segment which can still convey meaningful information about a character. These features are compared to prototype features in the character templates. These prototype features are also called proto-features. In the preferred embodiment, the proto-feature are the same size or larger than fea- tures, however, in other embodiments, they could be smaller. Also, in other embodiments, larger features could be used. The invention delin proto-features within a tem- plate to be an approximation of the outline of a charac- ter. ln the preferred embodiment, a straight line approx- imation is used, however, other approximations, such as for example arcs, could be used. FIG. 2 shows a set of template proto-features for the letters "o" and "I". Each straight line segment corresponds to one proto- feature. Referring now to FIG. 2, the letter "o" is shown having proto-features 202 and 204. ~I`he proto- features ar formed by starting at a point on the outline of the character and traversing the character in a direction such that the black area of the character is on the left side of the arrow. Proto-features are formed usiing the eight points of the compass, i.e. at 0 degrees. 45 degrees, 90 degrees, etc., therefore when the outline of the char- acter changes to a new eighth compass point, a new proto-feature will be started. In the case of the letter "o", there are 8 proto-features on the outside of the outline and 8 proto-features on the inside of the outline. In forming the proto-features for the letter "I" the ar- rows traveise in the same manner and continue as long as a straight line approximation is appropriate. In this manner, proto feature 206 is created along the top out- line of the character, and when the character outline changes direction downward, proto feature 208 is cre- ated. As the outline swings inward, proto-feature 210 is created, and as the outline descends vertically, a very long proto-feature 212 is created. Proto-features con- tinue to be created in this manner until the entire outline of the character is traversed. FIG. 3 shows an example of how proto-features are defined. Referring now to FIG. 3, a proto-feature, as represented by arrow 302, contains a midpoint 303 which is represented by is X, and Y, coordinates 304 and 306. The angle 6,, 303 of the proto-feature 302 is recorded relative to the direction east. That is, east is 0 degrees, with degrees being counted counterclockwise until a full circle is complete. The degrees of the angle are converted to a value between O and I where () repre- sents 0 degrees and l represents 360 degrees. A length 310 of the proto-feature 302 is also recorded for the proto~feature. Thus a proto-feature is defined by the X-Y coordinates of its center, its angle, and its length. Additional parameters are derived for each proto-fea- ture to improve computational speed when comparing features to proto-features. These parameters are defmed as A, B, C, Xmin, Xmax, Ymin, and Ymax. A, B, and C are the normalized parameters for the general form of a line, i.e. Ax+By+C==O. A, B, and C are computed as follows: SLOPE = un(@,) INTERCEPT = Y, ~ SLOPE * X NORMALIZE! = x I SQRT(SLOl*E**2 + t) A = SLOPE * NOIIMALIZER B = o - NORMALXZER C = XNTERCEFT * NORMALIZE}! where SQRT nun uk: the square rom of the equation in parenthesis. t-: ====== =gw= in vane or snort;. * menu mulxiplieuion. I means divtion. t nuns addition. - means mbtnetion, and we means uk: the ungesu. Xmin, Xmax, Ymin, and Yxnax d cribe a padded bounding box containing the proto-feature. The bound- ing box is used to quickly eliminate featurelproto-fea- ture pairs from further consideration when they are not a good match. This bounding box is computed as fol- lows: OPAD is a constant value of orthogonal pad, which is 2.5 times the feature at gth in the preferred embodi- ment. TPAD and OPAD are used to provide a small amount of additional space around a proto-feature which allows close features to still be considered. FIG. 4 shows an example set of features that would be extracted from the letters "o" and "I", as those char- acters are being analyzed on an unknown page of text. As a character is being analyzed, the fi tures extracted are much smaller than the proto-features that are cre- ated for a character template. Referring now to FIG. 4, feature 402 would be created by starting at an arbitrary point on the outline on the letter "o", and placing the center of the first feature at this arbitrary point. One method of picking an arbitrary point would be to select the portion of the outline that is located at the largest Y coordinate value. The angle of the feature is defined as the angle of the outline of the dark area of the character at the point. The feature extractor then moves along the outline, keeping the dark area on the left, for a distance of one feature length and places the center of the second feature, feature 403, at this new point. Unlike proto-fea- tures, all features from a character being analyzed are of a fixed length. This length could be any length so long as it is consistent, and in the preferred embodiment, the feature length is one-tenth of the x-height (defined be- low) of the current line of text. This process would continue until the entire outline has been traversed to create all the featur around the outline. The process would be performed for all outlines, including the inner outline to create features 404, etc. Similarly, an arbitrary point would he picked on the "I" character and features would be created in the same manner by traversing the outline. FIG. S shows a diagram of a feature extracted from a character being analyzed. When features are extracted from a character being analyzed, the features are all of the same, fixed, length. Therefore, no length parameter is needed for a feature extracted from a character being analyzed. Referring now to FIG. 5, a feature 502 has a midpoint 504 which is defmed by its Xjcoordinate 506 and its Y;-coordinate 503. The angle of the feature 502, 9;-510, is specified in the same manner as the angle OF 308 with respect to proto-features. FIG. 6 shows a block diagram of the hardware of the present invention. Referring now to FIG. 6, a scanning device 600 contains a processor 602 which communi- cates to other elements of the scanning device 600 over a system bus 604. Scanner electronics 606 scan a page of textual information and produce a graphical bit image of the contents of the page. Memory 610 contains the OCR process software 612 of the present invention which uses an operating system 614 to communicate to the scanner electronics 606, and to communicate with a host computer system over a host bus 616 through sys- tem interface electronics 608. The OCR process soft- ware 612 reads the pixels of the graphical bit image from the scanner electronics 606, processes that image according to the method of the pr ent invention, and sends the result to a host system over the host bus 616. The OCR process software could also nm in the host system. ` FIG. 7 shows a flow diagram of the overall process of the present invention. Referring now to FIG. 7, a page image 702 is received from the scanner electronics 606 (FIG. 6). This page image is processed by an extract character process 704 which identities each individual character on a page and placn that character into a character image data stream 706. The extraction of characters is well known in the an. '1`he character im- ages 706 are sent to an extract features process 700 of the present invention. The extract features process 708 will be described in detail with respect to FIG. 8. The extract features process 703 creates a list of character fe tures 710 which is sent to a classify character process 712. The classify character process 712 will be de- scribed below with respect to FIGS. 9 through 12. The output of the classified character process 712 is a coded characters data stream 714 which contains one or more choices for each character being analyzed. This output is sent to a word processor 716 where it is edited and displayed by the user of the system. The output may be sent through a host system bus to a word processor within a host system. FIG. 8 shows a flow chart of the extract features process of FIG. 7. FIG. 8 is called after character im- ages 706 (FIG. 7) have been created by the extract characters process 704. Referring now to FIG. S, after entry, block 802 determines whether all character im- ages have been processed. If more character images remain to be processed, block 802 transfers to block 804 which gets the next character image from the character images data str .. 706 (FIG. 7). Block $06 then nor- malizes the character image. When proto-features and features are created, as de- fined above with respect to FIGS. 2 through 5 respec- tively, the location of the feature is deined by an X and Y coordinate. The ranges for X and Y depend upon the coordinate system used in the matching process. Many different coordinate systems are possible It is desirable to choose a coordinate system in which characters have been normalized to a constant size. This allows the character classitication process to be insensitive to size variation in the characters. There are two basic tech- niques which can be used to normalize characters, ei- ther of which will work with the system of the present invention. The only requirement is that the same form of normalization be used for both features and proto- features. One such normalization technique is line normaliza- tion, where all characters within a line of text are scaled by a single factor. Scaling is uniform in both the X and Y directions, and the scale factor is chosen to force the X-height of the line, that is, the height of a lower case x character in the font, to be a constant. ln the preferred embodiment, this constant is 0.5. The base line of the text is also translated so that the position of the baseline for all characters on a line is a constant. In the preferred embodiment, this constant is zero. A second technique is character normalization, where each character is individually scaled to a fixed size. This scaling can be anisotropic, that is, using differ- ent scale factors in the X and Y directions. Some char- acter normalization techniques scale the bounding box of a character to a fixed size. Other techniques compute the radius of gyration of the character shape about the center of mass in both the X and Y directions and then scale the character according to these numbers. In the preferred embodiment, the character is scaled to a range of 0 to 1. After normalizing the character image, control trans- fers to block 808 which determines whether all outlines of the character image have been processed. Some char- acters inherently have multiple outlines, such as the character "i", which has an outline for the base part and an outline for the "dot". In other situations, a character may have multiple outlines due to distortion as de- acnbed above with respect to FIG. 1. If an outline re- mains to be processed, block 808 transfers to block 810 which gets the next outline from the image. Block 812 then determines an arbitrary point on the outline to start feature extraction. As descn'bed above with respect to FIG. 4, the features extracted from a character being analyzed may be extracted starting at any arbitrary point on the outline. After determining an arbitrary point, block 812 transfers to block 814 which places the center of the feature at the point just selected. Block 816 determines the angle of the feature by detennining the tangent to the dark portion of the outline at the point just selected. Block 818 then writes the feature statistics just collected, that is, the X,Y location of the center of the feature, and the angle of the feature, to the character features data stream 710 (FIG. 7). Block 820 then tra- verses along the character outline for one feature length. That is, the outline is followed keeping the dark portion of the outline to the left of the direction being followed, until the fused length of a feature has been traversed. After traversing one feature length, block 820 transfers to block 822 which determines whether the end of the outline has been reached and if not, block 822 transfers back to block 814 to create a new feature at the new location point on the outline. After the entire outline has been traversed, block 822 transfers back to block 808 to determine whether additional outlines exist within the character. After all outlines within the char- acter image have been processed, block 808 transfers back to block 802 to determine if additional characters remain in the character images data stream 706 (FIG. 7). After all character images have been processed, FIG. 8 retums to its caller. FIGS. 9 through 12 show flow charts of the classify character process 712 (FIG. 7). The method involv matching each feature extracted from a character being analyzed to each proto-feature within a template, to determine if the two are "similar". Similarity is deter- mined by the difference in the angles between the fea- ture and the proto-feature, as well as the distance from the feature to the proto-feature. The similarity number is then normalized to the range of zero to one to create a match evidence. A match evidence of zero means that there is no evidence that the feature matches the proto- feature, and a match evidence of one means that the feature is a perfect match to the proto-feature. After the evidence is determined, a match rating is computed by comparing all the features to the proto-features within the templates, and then by also comparing all the proto- features of a template to the features within the charac- ter being analyzed. Both cases must be analyzed in order to make sure that the character being analyzed is neither a subset nor a superset of the proto-features within the template. After these two comparisons are made, a match rating of the features of the character being analyzed to the proto-features within this tem- plate is computed. The character image is then com- pared to the next template, until it has been compared to all templates possible. After all these comparisons are made, the match rating with the highest numbers are sent as the coded characters data stream 714 (FIG. 7) The details of this method are described below. FIG. 9 is a flow chart of the top level processing module of the classify character process. Referring now to FIG. 9, after entry, block 902 determines whether all characters have been processed and if not, transfers control to block 904 which gets the character features for the next character. Block 906 then sets a RATEL- IST variable to "empty" and block 908 determines whether all templates have been processed against this character. If all templates have not been processed against this character, block 908 transfers to block 910 which gets the proto-features from the next template. Block 912 then calls FIG. 10 to match the features from the character being analyzed to the proto-features of the template. Block 914 then calls FIG. 12 to match the proto-features from the template to the features from the character being analyzed, and block 916 then deter- min the match rating for this character and template combination. The match rating is determined by the following formula: RATING = EFA VG * LFTOTAL i EPA YG * LFTOTAL LFTOTAL + LPTOTAL The above formula computes the match rating as a weighted average of the average feature evidence (EFAVG) and the average proto evidence (EPAVG). This will be a number between zero and one where one is a perfect match and zero is no match. The feature evidence is weighted by the total length of all features (LFTOTAL) and the proto-features evidence is weighted by the total length of all proto-features (LPTOTAL). After computing the match rating for this characterl- template combination, block 918 puts this match rating in the RATELIST and transfers back to block 908 to process the next template. After the character image has been processed against all templat , block 908 transfers to block 920 which sorts the KATELIST in order of descending match rating. Block 922 then extracts the highest matches and all matches that are close to the highest match. In this manner, the method can output the best possible choices for the character. In the pre- ferred embodiment, the character represented by the highest match rating is output, and all characters that are within 0.15 of the highest match rating are consid- cred "close" and are also output. After selecting the character with the highest match rating, and any characters that are close to the highest match rating, block 924 sends the coded characters for the extracted matches, such as coded characters from the ASCII character set, to the output data stream 714 (FIG. 7) before retuming to block 902 to process the next character. After all characters have been pro- cessed, block 902 returns to its caller to allow a higher level of character level classification to proceed. FIG. 10 shows a flow chart of the determine features to proto average called from block 912 of FIG. 9. Re- ferring now to FIG. 10, after entry, block 1002 sets a variable TOTAL equal to zero and sets another vari- able NUMFEAT, which represents the number of fea- tures, equal to zero. Block 1004 then determines whether all features of the character being analyzed have been processed, and if not, block 1004 transfers to block 1006 which increments the value of the variable NUMFEAT. Block 1008 then gets the next feature and block 1010 sets a value of a variable BEST equal to zero. Block 1012 then determines whether all proto-fea- tures of the template have been processed and if not, transfers to block 1020 which gets the next proto-fea- ture. Block 1022 then calls FIG. 11 to determine the match evidence for the feature obtained in block 1008 compared to the proto-feature obtained in block 1020. After determining the match evidence, block 1024 then determines whether the evidence returned from FIG. 11 is greater than the best evidence determined so far. If the evidence is greater than BEST, block 1024 transfers to block 1026 which sets BEST equal to the new evi- dence. If the evidence is less than or equal to BEST, or after setting BEST equal to evidence, control transfers back to block 1012 to check the next proto-feature within the -t== plate. After all proto-features within the template have been processed, block 1012 transfers to block 1014 which adds the value of the variable BEST to the value of the variable TOTAL. Block 1014 then transfers back to block 1004 to determine whether all features within the character being analyzed have been processed. After all features in the character have been processed, block 1004 transfers to block 1016 which computes a variable EFAVG to the value of the vari- able TOTAL divided by the value of the variable NUMFEAT. Block 1018 then computes the value of a variable LI-'TOTAL equal to the value of the variable NUMFEAT multiplied by the value of a variable FEATLEN, which is the length of each feature. As described above the length of a feature extracted from a character being analyzed is always a fixed value, so the value of LFIOTAL is simply the value of the number of features multiplied by this tixed length. After com~ puting the values for EFAVG and LFTOTAL, control retums to FIG. 9. FIG. 11 shows a flow chart of the determine match evidence process called from block 1022 of FIG. 10. Referring now to FIG. 11, after entry, block 1102 deter- mines whether the feature is within the bounding box of the template. As described above with respect to FIGS. 2 and 3, a proto-feature within a template has a bound- ing box defined for it. If a feature from a character being analyzed is located outside this bounding box, as de- {med by comparing the X and Y coordinates of the feature midpoint to the Xmin, Xmax, Ymin, and Ymax parameters defined above, the similarity of the feature to the proto-feature will be so large as to be beyond consideration. Therefore, if the feature is not within the bounding box, block 1102 transfers to block 1112 which sets the match evidence value to zero before returning to FIG. 10. If the feature is within the bounding box, block 1102 transfers to block 1104 which computes a variable AN- GLEDIFF to the square of the angle of the feature minus the angle of the proto-feature. The angle differ- ence is computed is a circular fashion, that is, the differ- ence between an angle of zero and an angle of 1 is zero and ANGLEDIFF is never greater than 0.5 squared, i.e. 0.25. After computing the angle difference, block 1104 transfers to block 1106 which computes the value of a distance variable by squaring the value of the pa- rameter A for the proto-feature multiplied by the X location of the feature, plus the value of B for the proto- feature multiplied by the Y value of the feature, plus the value of the variable C for the proto-feature. The pa- rameters A, B, and C were defined above with respect to FIGS. 2 and 3. This distance is the distance between the location of the center of the feature and the line of the proto-feature. Block 1108 then computes a similarity variable as the angle difference times a constant K, plus the distance computed in block 1106. The constant K is used to adjust the relative contribution of the angle difference and the distance difference to the similarity measure. In the present invention, the constant K is set to a value of one. After computing the similarity, block 1108 transfers to block 1110 which comput the match evidence by dividing the similarity by a constant SM, squaring this result, adding one to the square, and divid- ing all of this into one. In this manner, the match evi- dence will very from zero to one where zero means no match, and one is a perfect match. The constant SM defm what values of similarity will map to an evi- dence value of 0.5, that is, the midpoint. In the system of the present invention, the constant SM is set to a value of 0.0075. After computing the match evidence, FIG. 11 retums to FIG. 10. FIG. 12 shows a flow chart of the determine proto to features average process called from block 914 of FIG. 9. This process will match each proto-feature of the template to each feature from the character being ana- lyzed. Referring now to FIG. 12, after entry, block 1202 sets 3 variables equal to zero, the variabl TOTAL, LPTOTAL, and NUMMATCH. Block 1204 then de- termines whether all proto-features have been pro- cessed and if not, transfers control to block 1206 which gets the next proto-feature from the template. Block 1208 then adds the length of this proto-feature to the value of the variable LHTOTAL, and block 1210 sets the value of a variable called MATCHLIST to "empty". Block 1112 then determines whether all fea- tures in the character have been processed and if not, transfers control to block 1213 which gets the next feature. Block 1214 calls FIG. 11 to determine the match evidence between this feature and the proto-fea- ture retrieved in block 1206. After retuming from FIG. 11, block 1216 appends the match evidence from the comparison to MATCHLIST and then retums to block 1212 to process the next feature. After all features within the character have been processed, block 1212 transfers to block 1220 which sorts MATCHLIST in the order of decreasing evidence, thus, the features that most closely match this proto-feature would sort to the top of the list. Block 1222 then sets a variable NUM..TO..KEEP to the value of the length of this proto-feature divided by the feature length of the fea- tures from the character being analyzed. As discussed above, this feature length is a fumed number for all the features from the character being analyzed. Thus, the variable NUM_.TO.KEEP identili how many fea- tures would fit along the length of the proto-feature. Block 1224 then extracts this number of elements from the front of the match list and block 1226 adds the evi- dence values of all these elements together. Block 1228 then adds the sum just created to the value of the vari- able TOTAL. Block 1230 adds the value of the variable NUM._TO_KEEP to the value of the variable NUM- MATCH before returning to block 1204 to determine if additional proto-features need to be processed for this template. After all proto-features have been processed for this template, block 1204 transfers to block 1218 which creates a value for the variable EPAVG by di- viding the value of the variable TOTAL by the value of the variable NUMMATCH before returning EPAVG and LPTOTAL to FIG. 9. After all characters have been analyzed, and a coded character selected for each character image, the coded characters are sent to a host system over the host system bus 616 (FIG. 6) via the system interface 608 (FIG. 6). The host system then stores the coded characters in a tile where they can be processed using standard word processing systems. It is also possible to use the f#= .~ and match pro- cess described above to detect the presence of high- level features which are then used in a further matching process to classify a character. Using this process, tem- plates contain high-level features, such as bays, clo- sures, etc., rather that an entire character shape. Once the features have been extracted from an tmlmown character they are matched to the template, as de- scn^bed above, to identify the high-level features that are contained in the unknown character. Decision trees or other standard matching methods are then used to classify the unknown character based on the high-level features found. Having thus described a presently preferred embodi- ment of the present invention, it will now be appreci- ated that the objects of the invention have been fully achieved, and it will be tmderstood by those skilled in the art that many changes in construction and circuitry and widely differing embodiments and applications of the invention will suggest themselves without departing from the spirit and scope of the present invention. The disclosures and the description herein are intended to be illustrative and are not in any sense limiting of the in- vention, more preferably defined in scope by the fol- lowing claims. What is claimed is: 1. A method for optical character recognition com- prising the steps of: (a) converting a page having a plurality of text printed thereon into a graphical image containing a plurality of pixel elements representative of said text; (b) separating said graphical image into a plurality of character imag ; (c) scanning said character images to produce a set, containing a plurality of features, for each charac- ter image of said plurality of character images; (d) convening each of said sets of features into at least . one coded character equivalent to each of said character images comprising the steps of (dl) selecting one of said sets, (d2) selecting one of a plurality of templates, one of said templates having been previously defined for each character to be converted, wherein each of said templates contains a plurality of proto- features, (d3} comparing each of said features to each of said proto-features to create a rating comprising the steps of (d3a) comparing each of said features from said selected set to each of said proto-features from said selected template to create an average feature match evidence, (d3b) comparing each of said proto-features from said selected template to each of said features from said selected set to create an average proto match evidence, (d3c) computing a rating by averaging said fea- ture match evidence and said proto match evidence; (d4) adding said rating to a rating list, _ (d5) repeating steps (d2) through (d4) for each of said templates, (d6) selecting at least one highest rating within said rating list as said coded character equivalent to said character image, and (d7) repeating steps (d2) through (d6) for each of said character images; and (e) sending said coded characters to a word processor for editing and display. 2. The process of claim 1 wherein step (d3a) further comprises the steps of: (d3al) selecting one of said features; (d3a2) selecting one of said proto-features; (d3a3) computing an angle difference value between an angle of said selected feature and an angle of said selected proto-feature; (d3a4) computing a distance difference value between a center of said feature and said proto-feature; (d3a5) computing a similarity as the sum of said angle difference value and said distance difference value; (d3a6) computing an evidence value by normalizing said similarity to a predetermined range of values; (d3a7) repeating steps (d3a2) through (d3a6) for each of said proto-features and selecting a highest of said evidence values; (d3a8) adding said selected highest evidence value to a total value; (d3a9) repeating steps (d3al) through (d3a8) for each of said features; and (d3alO) dividing said total value by a number of fea- tures to create said average feature match evi- dence. 3. The process of claim 2 wherein step (d3a2) further comprises the steps of: (d3a2a) comparing a location of said selected feature to locations of each of said proto-featur ; and (d3a.2b) setting said evidence value to zero and con- tinuing with step (d3a7} if said feature is located outside a predefmed area surrounding said proto- feature. 4. The process of claim I wherein step (d3b) further comprises the steps of: (d3bl) selecting one of said proto-features; (d3b2) selecting one of said features; (d3b3) computing an angle difference value between an angle of said selected feature and an angle of said selected proto-feature; (d3b4) computing a distance difference value be- tween a center of said feature and said proto-fea- ture; (d3b5) computing a similarity as the sum of said angle difference value and said distance difference value; (d3b6) computing an evidence value by normalizing said similarity to ma predetermined range of values; (d3b7) repeating steps (d3b2) through (d3b6) for each of said features and appending each of said evi- dence values to a match list. (d3b8) sorting said match list; (d3b9) determining a largest number of integral fea- tures that will lit within a length of said proto-fea- ture; (d3bl0) selecting a number of evidence values from a first of said sorted match list equal to said largest number of integral features and adding said number to a number matched value; (d3b1l) adding a sum of said selected number of evi- dence values to a total value; (d3bl2) repeating steps (d3bl) through (d3b1l) for each of said proto-features; and (d3bl3) dividing said total value by said number matched value to create said average proto match evidence. 5. The process of claim 4 wherein step (d3b2) further comprises the steps of: (d3b2a) comparing a location of said selected he ture to locations of each of said proto-features; and (d3b2b) setting said evidence value to zero and con- tinuing with step (d3a7) if said feature is located outside a predefined area surrounding said proto- feature. 6. The process of claim 1 wherein step (c) `further comprises the steps of: (cl) separating said character images into a plurality of outlines each defined by a boundary between pixels of different intensity within said character images; (c2) locating an arbitrary point on one of said out- lines; (c3) defining one of said plurality of features at said point and adding said feature to said set for said character image; (c4) traversing said outline to a new point at a prede- termined distance from said point; (c5) repeating steps (c3) and (c4) until said outline is completely traversed; and (c6) repeating steps (c2) through (c5) for % ch of said outlines within said character image to complete said set; and (c7) repeating steps (cl) through (c6) for each of said character images. 7. A method for optical character recognition com- prising the steps of: (a) convening a page having plurality of text printed thereon into a graphical image containing a plural- ity of pixel elements representative of said text; (b) separating said graphical image into a plurality of character images; (c) scanning said character images to produce a set, containing a plurality of features, for each charac- ter image of said plurality of character images com- prising the steps of (cl) separating said character images into a plural- ity of outlines each defined by a boundary be- tween pixels of different intensity within said character images, (c2) locating an arbitrary point on one of said out- lines, (c3) defining one of said plurality of features at said point and adding said feature to said set for said character image, (c4) traversing said outline to a new point at a predetermined distance from said point, (c5) repeating steps (c3) and (c4) until said outline is completely traversed, and (c6) repeating steps (c2) through (c5) for each of said outlines within said character image to com- plete said set; (c7) repeating steps (c2) through (c6) for each of said character images; (d) converting each of said sets of features into at least one coded character equivalent to each of said character images; and (e) sending said coded characters to a word processor for editing and display. ` selected template to create an average fag ture match evidence, (d3b) comparing each of said proto-features from said selected template to each of said features from said selected set to create an average proto match evi- dence, (d3c) computing a rating by averaging said feature match evidence and said proto match evidence. 10. The process of claim 9 wherein step {d3a) further comprises the steps of: (d3al) selecting one of said features; (d3a2) selecting one of said proto-features; (d3a3) computing an angle difference value between an angle of said selected feature and an angle of said selected proto-feature; (d3a4) computing a distance difference value between a center of said feature and said proto-feature; (d3a5) computing a similarity as the sum of said angle difference value and said distance difference value; (d3a6) computing an evidence value by normalizing said similarity to a predetermined range of values; (d3a7) repeating steps (d3a2) through (d3a6) for each of said proto-features and selecting a highest of said evidence values (d3a8) adding said selected high t evidence value to a total value; (d3a9) repeating steps (d3al) through (d3a8) for ~a eh of said features; and (d3alO) dividing said total value by a number of fea- tures to create said average feature match evi- dence. 11. The process of claim 10 wherein step (d3a2) fur- ther comprises the steps of: (d3a2a) comparing a location of said selected feature to locations of each of said proto-features; and (d3a2b) setting said evidence value to zero and con- tinuing with step (d3a7) if said feature is located outside a predefined area surrounding said proto- feature. l2. The process of claim 9 wherein step (d3b) further comprises the steps of: (d3bl) selecting one of said proto-features; (d3b2) selecting one of said features; (d3b3) computing an angle difference value between an angle of said selected feature and an angle of said selected proto-feature; (d3b4) computing a distance difference value be- tween a center of said feature and said proto-fea- tllre; (d3b5) computing a similarity as the sum of said angle difference value and said distance difference value; (d3b6) computing an evidence value by normalizing said similarity to a predetermined range of values; (d3b7) repeating steps (d3b2) through (d3b6) for each of said features and appending .,= ch of said evi- dence values to a match list; (d3b8) sorting said match list; (d3b9) determining a larg t number of integral fea- tures that will tit within a length of said proto-fea- tuft; (d3blO) selecting a number of evidence values from a first of said sorted match list equal to said largest number of integral features and adding said number to a number matched value; (d3bl 1) adding a sum of said selected number of evi- dencc values to a total value; (d3bl2) repeating steps (d3bl) through (d3bll) for each of said proto-features; and (d3b13) dividing said total value by said number matched value to create said average proto match evidence. 13. The process of claim 12 wherein step (d3b2) fur- ther comprises the steps of; (d3bZa) comparing a location of said selected feature to locations of each of said proto-features; and (d3b2b) setting said evidence value to zero and con- tinuing with step (d3a7) if said feature is located outside a predefined area surrounding said proto- feature.