wordrec/metrics.h File Reference

#include "measure.h"
#include "bestfirst.h"
#include "states.h"

Go to the source code of this file.

Functions

Variables


Function Documentation

void end_metrics (  ) 

Clean up memory used by metrics.

Note:
New in v1.03

Definition at line 128 of file metrics.cpp.

References best_certainties, character_widths, memfree(), NULL, and states_before_best.

Referenced by program_editdown().

void init_metrics (  ) 

Set up the appropriate variables to record information about the OCR process.

Later calls will log the data and save a summary.

Definition at line 95 of file metrics.cpp.

References best_certainties, CERTAINTY_BUCKETS, character_count, chars_classified, chops_attempted1, chops_attempted2, chops_performed1, chops_performed2, min, new_tally(), num_seg_states, permutation_count, save_priorities, segmentation_states1, segmentation_states2, states_before_best, states_timed_out1, states_timed_out2, word_count, words_chopped1, words_chopped2, words_segmented1, and words_segmented2.

Referenced by program_editup2().

00095                     {
00096   words_chopped1 = 0;
00097   words_chopped2 = 0;
00098   chops_performed1 = 0;
00099   chops_performed2 = 0;
00100   chops_attempted1 = 0;
00101   chops_attempted2 = 0;
00102 
00103   words_segmented1 = 0;
00104   words_segmented2 = 0;
00105   states_timed_out1 = 0;
00106   states_timed_out2 = 0;
00107   segmentation_states1 = 0;
00108   segmentation_states2 = 0;
00109 
00110   save_priorities = 0;
00111 
00112   character_count = 0;
00113   word_count = 0;
00114   chars_classified = 0;
00115   permutation_count = 0;
00116 
00117   states_before_best = new_tally (min (100, num_seg_states));
00118 
00119   best_certainties[0] = new_tally (CERTAINTY_BUCKETS);
00120   best_certainties[1] = new_tally (CERTAINTY_BUCKETS);
00121 }

void record_certainty ( float  certainty,
int  pass 
)

Maintain a record of the best certainty values achieved on each word recognition.

Definition at line 145 of file metrics.cpp.

References best_certainties, CERTAINTY_BUCKET_SIZE, inc_tally_bucket, and MAXINT.

Referenced by classify_word_pass1(), and classify_word_pass2().

00145                                                  {
00146   int bucket;
00147 
00148   if (certainty / CERTAINTY_BUCKET_SIZE < MAXINT)
00149     bucket = (int) (certainty / CERTAINTY_BUCKET_SIZE);
00150   else
00151     bucket = MAXINT;
00152 
00153   inc_tally_bucket (best_certainties[pass - 1], bucket);
00154 }

void record_priorities ( SEARCH_RECORD the_search,
STATE old_state,
FLOAT32  priority_1,
FLOAT32  priority_2 
)

Record priorities.

If the record mode is set then record the priorities returned by each of the priority voters. Save them in a file that is set up for doing clustering.

Definition at line 267 of file metrics.cpp.

References record_samples().

Referenced by prioritize_state().

00270                                            {
00271   record_samples(priority_1, priority_2);
00272 }

void record_samples ( FLOAT32  match_pri,
FLOAT32  width_pri 
)

Remember the priority samples to summarize them later.

Definition at line 279 of file metrics.cpp.

References ADD_SAMPLE, match_priority_range, and width_priority_range.

Referenced by record_priorities().

00279                                                           {
00280   ADD_SAMPLE(match_priority_range, match_pri);
00281   ADD_SAMPLE(width_priority_range, width_pri);
00282 }

void record_search_status ( int  num_states,
int  before_best,
float  closeness 
)

Record information about each iteration of the search.

This data is kept in global memory and accumulated over multiple segmenter searches.

Definition at line 164 of file metrics.cpp.

References first_pass, inc_tally_bucket, num_seg_states, segmentation_states1, segmentation_states2, states_before_best, states_timed_out1, states_timed_out2, words_segmented1, and words_segmented2.

Referenced by delete_search().

00164                                                                             {
00165   inc_tally_bucket(states_before_best, before_best);
00166 
00167   if (first_pass) {
00168     if (num_states == num_seg_states + 1)
00169       states_timed_out1++;
00170     segmentation_states1 += num_states;
00171     words_segmented1++;
00172   }
00173   else {
00174     if (num_states == num_seg_states + 1)
00175       states_timed_out2++;
00176     segmentation_states2 += num_states;
00177     words_segmented2++;
00178   }
00179 }

void reset_width_tally (  ) 

Create a tally record and initialize it.

Definition at line 289 of file metrics.cpp.

References character_widths, new_measurement, new_tally(), MEASUREMENT::num_samples, MEASUREMENT::sum_of_samples, MEASUREMENT::sum_of_squares, and width_measure.

Referenced by program_editup2().

00289                          {
00290   character_widths = new_tally (20);
00291   new_measurement(width_measure);
00292   width_measure.num_samples = 158;
00293   width_measure.sum_of_samples = 125.0;
00294   width_measure.sum_of_squares = 118.0;
00295 }

void save_best_state ( CHUNKS_RECORD chunks_record  ) 

Save this state away to be compared later.

Definition at line 303 of file metrics.cpp.

References bin_to_chunks(), CHUNKS_RECORD::chunks, cprintf(), display_segmentation(), free_state(), known_best_state, matrix_dimension, memfree(), new_state(), num_joints, STATE::part1, STATE::part2, CHUNKS_RECORD::ratings, save_priorities, segm_window, and window_wait().

Referenced by best_first_search().

00303                                                    { 
00304   STATE state;
00305   SEARCH_STATE chunk_groups;
00306   int num_joints;
00307 
00308   if (save_priorities) {
00309     num_joints = matrix_dimension (chunks_record->ratings) - 1;
00310 
00311     state.part1 = 0xffffffff;
00312     state.part2 = 0xffffffff;
00313 
00314     chunk_groups = bin_to_chunks (&state, num_joints);
00315     display_segmentation (chunks_record->chunks, chunk_groups);
00316     memfree(chunk_groups);
00317 
00318     cprintf ("Enter the correct segmentation > ");
00319     fflush(stdout);
00320     state.part1 = 0;
00321     scanf ("%x", &state.part2);
00322 
00323     chunk_groups = bin_to_chunks (&state, num_joints);
00324     display_segmentation (chunks_record->chunks, chunk_groups);
00325     memfree(chunk_groups);
00326     window_wait(segm_window);  /* == 'n') */
00327 
00328     if (known_best_state)
00329       free_state(known_best_state);
00330     known_best_state = new_state (&state);
00331   }
00332 }

void save_summary ( INT32  elapsed_time  ) 

Save the summary information into the file "file.sta".

Definition at line 186 of file metrics.cpp.

References best_certainties, CERTAINTY_BUCKET_SIZE, character_count, chars_classified, CHARS_PER_LINE, chops_attempted1, chops_attempted2, chops_performed1, chops_performed2, cprintf(), dj_statistics(), f, imagefile, INT32FORMAT, iterate_tally, open_file(), permutation_count, print_tally(), PrintIntMatcherStats(), segmentation_states1, segmentation_states2, states_before_best, states_timed_out1, states_timed_out2, tally_entry, word_count, words_chopped1, words_chopped2, words_segmented1, and words_segmented2.

Referenced by program_editdown().

00186                                       {
00187   #ifndef SECURE_NAMES
00188   char outfilename[CHARS_PER_LINE];
00189   FILE *f;
00190   int x;
00191   int total;
00192 
00193   strcpy(outfilename, imagefile);
00194   strcat (outfilename, ".sta");
00195   f = open_file (outfilename, "w");
00196 
00197   fprintf (f, INT32FORMAT " seconds elapsed\n", elapsed_time);
00198   fprintf (f, "\n");
00199 
00200   fprintf (f, "%d characters\n", character_count);
00201   fprintf (f, "%d words\n", word_count);
00202   fprintf (f, "\n");
00203 
00204   fprintf (f, "%d permutations performed\n", permutation_count);
00205   fprintf (f, "%d characters classified\n", chars_classified);
00206   fprintf (f, "%4.0f%% classification overhead\n",
00207     (float) chars_classified / character_count * 100.0 - 100.0);
00208   fprintf (f, "\n");
00209 
00210   fprintf (f, "%d words chopped (pass 1) ", words_chopped1);
00211   fprintf (f, " (%0.0f%%)\n", (float) words_chopped1 / word_count * 100);
00212   fprintf (f, "%d chops performed\n", chops_performed1);
00213   fprintf (f, "%d chops attempted\n", chops_attempted1);
00214   fprintf (f, "\n");
00215 
00216   fprintf (f, "%d words joined (pass 1)", words_segmented1);
00217   fprintf (f, " (%0.0f%%)\n", (float) words_segmented1 / word_count * 100);
00218   fprintf (f, "%d segmentation states\n", segmentation_states1);
00219   fprintf (f, "%d segmentations timed out\n", states_timed_out1);
00220   fprintf (f, "\n");
00221 
00222   fprintf (f, "%d words chopped (pass 2) ", words_chopped2);
00223   fprintf (f, " (%0.0f%%)\n", (float) words_chopped2 / word_count * 100);
00224   fprintf (f, "%d chops performed\n", chops_performed2);
00225   fprintf (f, "%d chops attempted\n", chops_attempted2);
00226   fprintf (f, "\n");
00227 
00228   fprintf (f, "%d words joined (pass 2)", words_segmented2);
00229   fprintf (f, " (%0.0f%%)\n", (float) words_segmented2 / word_count * 100);
00230   fprintf (f, "%d segmentation states\n", segmentation_states2);
00231   fprintf (f, "%d segmentations timed out\n", states_timed_out2);
00232   fprintf (f, "\n");
00233 
00234   total = 0;
00235   iterate_tally (states_before_best, x)
00236     total += (tally_entry (states_before_best, x) * x);
00237   fprintf (f, "segmentations (before best) = %d\n", total);
00238   if (total != 0.0)
00239     fprintf (f, "%4.0f%% segmentation overhead\n",
00240       (float) (segmentation_states1 + segmentation_states2) /
00241       total * 100.0 - 100.0);
00242   fprintf (f, "\n");
00243 
00244   print_tally (f, "segmentations (before best)", states_before_best);
00245 
00246   iterate_tally (best_certainties[0], x)
00247     cprintf ("best certainty of %8.4f = %4d %4d\n",
00248     x * CERTAINTY_BUCKET_SIZE,
00249     tally_entry (best_certainties[0], x),
00250     tally_entry (best_certainties[1], x));
00251 
00252   PrintIntMatcherStats(f);
00253   dj_statistics(f);
00254   fclose(f);
00255   #endif
00256 }

void start_recording (  ) 

Set up everything needed to record the priority voters.

Definition at line 340 of file metrics.cpp.

References open_file(), priority_file_1, priority_file_2, priority_file_3, and save_priorities.

Referenced by best_first_search().

00340                        {
00341   if (save_priorities) {
00342     priority_file_1 = open_file ("Priorities1", "w");
00343     priority_file_2 = open_file ("Priorities2", "w");
00344     priority_file_3 = open_file ("Priorities3", "w");
00345   }
00346 }

void stop_recording (  ) 

Put an end to the priority recording mechanism.

Definition at line 353 of file metrics.cpp.

References priority_file_1, priority_file_2, priority_file_3, and save_priorities.

Referenced by best_first_search().

00353                       {
00354   if (save_priorities) {
00355     fclose(priority_file_1);
00356     fclose(priority_file_2);
00357     fclose(priority_file_3);
00358   }
00359 }


Variable Documentation

int character_count

Definition at line 54 of file metrics.cpp.

Referenced by init_metrics(), save_summary(), and write_results().

int chars_classified

Global counter, used in classify_blob()

Definition at line 57 of file metrics.cpp.

Referenced by init_metrics(), and save_summary().

int chops_attempted1

chops tried, 1st pass of attempt_blob_chop()

Definition at line 49 of file metrics.cpp.

Referenced by attempt_blob_chop(), init_metrics(), and save_summary().

int chops_attempted2

chops tried, 2nd pass of attempt_blob_chop()

Definition at line 51 of file metrics.cpp.

Referenced by attempt_blob_chop(), init_metrics(), and save_summary().

int chops_performed1

chops actually done, 1st pass of attempt_blob_chop()

Definition at line 50 of file metrics.cpp.

Referenced by improve_by_chopping(), init_metrics(), and save_summary().

int chops_performed2

chops actually done, 2nd pass of attempt_blob_chop()

Definition at line 52 of file metrics.cpp.

Referenced by improve_by_chopping(), init_metrics(), and save_summary().

MEASUREMENT match_priority_range

Definition at line 64 of file metrics.cpp.

Referenced by record_samples().

int permutation_count

Global counter, used in permute_characters()

Definition at line 39 of file permute.cpp.

Referenced by init_metrics(), permute_characters(), and save_summary().

MEASUREMENT width_measure

Only used in prioritize_state

Definition at line 60 of file metrics.cpp.

Referenced by reset_width_tally().

MEASUREMENT width_priority_range

Help to normalize

Definition at line 63 of file metrics.cpp.

Referenced by record_samples().

int word_count

Definition at line 55 of file metrics.cpp.

Referenced by apply_box_testing(), eval_word_spacing(), init_metrics(), make_prop_words(), recog_all_words(), save_summary(), and write_results().

int words_chopped1

Note:
File: metrics.h (Formerly metrics.h)
Statistics stuff
Author:
Mark Seaman, SW Productivity
Date:
Fri Oct 16 14:37:00 1987 Tue Jul 30 17:02:48 1991 (Mark Seaman) marks
 * (c) Copyright 1987, Hewlett-Packard Company.
 ** Licensed under the Apache License, Version 2.0 (the "License");
 ** you may not use this file except in compliance with the License.
 ** You may obtain a copy of the License at
 ** http://www.apache.org/licenses/LICENSE-2.0
 ** Unless required by applicable law or agreed to in writing, software
 ** distributed under the License is distributed on an "AS IS" BASIS,
 ** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 ** See the License for the specific language governing permissions and
 ** limitations under the License.

Definition at line 47 of file metrics.cpp.

Referenced by chop_word_main(), init_metrics(), and save_summary().

int words_chopped2

Definition at line 48 of file metrics.cpp.

Referenced by chop_word_main(), init_metrics(), and save_summary().


Generated on Wed Feb 28 19:49:28 2007 for Tesseract by  doxygen 1.5.1