classify/cluster.h File Reference

#include "kdtree.h"
#include "oldlist.h"

Go to the source code of this file.

Classes

Defines

Typedefs

Enumerations

Functions


Define Documentation

#define ALREADYCLUSTERED   4000

Definition at line 200 of file cluster.h.

Referenced by MakeSample().

#define InitSampleSearch ( S,
 )     (((C)==NULL)?(S=NIL):(S=push(NIL,(C))))

low level cluster tree analysis routines.

Definition at line 175 of file cluster.h.

Referenced by ComputeStatistics(), FillBuckets(), and MultipleCharSamples().


Typedef Documentation

CLUSTER

Type sample, details on subclusters that make up a sample.

PROTOTYPE

Type proto, info & statistics on one prototype.

SAMPLE

Can refer to either sample or cluster.

Definition at line 58 of file cluster.h.


Enumeration Type Documentation

enum DISTRIBUTION

FIX: type of distribution for each dimension.

Enumerator:
normal 
uniform 
D_random 

Definition at line 90 of file cluster.h.

00090              {
00091   normal, uniform, D_random
00092 } DISTRIBUTION;

enum PROTOSTYLE

FIX: Style of prototype.

Enumerator:
spherical 
elliptical 
mixed 
automatic 

Definition at line 64 of file cluster.h.

00064              {
00065   spherical, elliptical, mixed, automatic
00066 } PROTOSTYLE;


Function Documentation

LIST ClusterSamples ( CLUSTERER Clusterer,
CLUSTERCONFIG Config 
)

If unclustered, put samples of KD tree into cluster tree and compute its prototypes.

Parameters:
Clusterer data struct containing samples to be clustered
Config parameters which control clustering process
Returns:
Pointer to a list of prototypes
First checks to see if the samples in this clusterer have already been clustered before; if so, it does not bother to recreate the cluster tree. It simply recomputes the prototypes based on the new Config info.

If the samples have not been clustered before, the samples in the KD tree are formed into a cluster tree and then the prototypes are computed from the cluster tree.

In either case this routine returns a pointer to a list of prototypes that best represent the samples given the constraints specified in Config.

Note:
Exceptions: None
Date:
5/29/89, DSJ, Created.

Definition at line 557 of file cluster.cpp.

References ComputePrototypes(), Config, CreateClusterTree(), FreeProtoList(), NULL, and CLUSTERER::Root.

00557                                                                  {
00558   //only create cluster tree if samples have never been clustered before
00559   if (Clusterer->Root == NULL)
00560     CreateClusterTree(Clusterer);
00561 
00562   //deallocate the old prototype list if one exists
00563   FreeProtoList (&Clusterer->ProtoList);
00564   Clusterer->ProtoList = NIL;
00565 
00566   //compute prototypes starting at the root node in the tree
00567   ComputePrototypes(Clusterer, Config);
00568   return (Clusterer->ProtoList);
00569 }                                // ClusterSamples

void FreeClusterer ( CLUSTERER Clusterer  ) 

Free memory for specified structure without affecting prototype list.

Parameters:
Clusterer pointer to data structure to be freed
Returns:
None
Frees all of the memory allocated to the specified data structure. It will not, however, free the memory used by the prototype list. The pointers to the clusters for each prototype in the list will be set to NULL to indicate that the cluster data structures no longer exist. Any sample lists that have been obtained via calls to GetSamples are no longer valid.
Note:
Exceptions: None
Date:
6/6/89, DSJ, Created.

Definition at line 588 of file cluster.cpp.

References first, FreeCluster(), FreeKDTree(), iterate, CLUSTERER::KDTree, memfree(), NULL, CLUSTERER::ParamDesc, CLUSTERER::ProtoList, and CLUSTERER::Root.

00588                                          {
00589   if (Clusterer != NULL) {
00590     memfree (Clusterer->ParamDesc);
00591     if (Clusterer->KDTree != NULL)
00592       FreeKDTree (Clusterer->KDTree);
00593     if (Clusterer->Root != NULL)
00594       FreeCluster (Clusterer->Root);
00595     iterate (Clusterer->ProtoList) {
00596       ((PROTOTYPE *) (first (Clusterer->ProtoList)))->Cluster = NULL;
00597     }
00598     memfree(Clusterer);
00599   }
00600 }                                // FreeClusterer

void FreeProtoList ( LIST ProtoList  ) 

Frees all of the memory allocated to the specified list of prototypes.

Parameters:
ProtoList pointer to list of prototypes to be freed
Returns:
None
The clusters which are pointed to by the prototypes are not freed.
Note:
Exceptions: None
Date:
6/6/89, DSJ, Created.

Definition at line 614 of file cluster.cpp.

References destroy_nodes(), and FreePrototype().

Referenced by ClusterSamples(), and FreeNormProtos().

00614                                     {
00615   destroy_nodes(*ProtoList, FreePrototype);
00616 }                                // FreeProtoList

void FreePrototype ( void *  arg  ) 

Deallocates the memory consumed by the specified prototype and modifies the corresponding cluster so that it is no longer marked as a prototype.

Parameters:
arg Prototype data structure to be deallocated
Returns:
None
The cluster is NOT deallocated by this routine.
Note:
Exceptions: None
Date:
5/30/89, DSJ, Created.

Definition at line 631 of file cluster.cpp.

References proto::Cluster, proto::Distrib, FLOATUNION::Elliptical, FALSE, proto::Magnitude, proto::Mean, memfree(), NULL, sample::Prototype, spherical, proto::Style, proto::Variance, and proto::Weight.

Referenced by FreeProtoList(), and MakeMixedProto().

00631                               {  //PROTOTYPE     *Prototype)
00632   PROTOTYPE *Prototype = (PROTOTYPE *) arg;
00633 
00634   // unmark the corresponding cluster (if there is one
00635   if (Prototype->Cluster != NULL)
00636     Prototype->Cluster->Prototype = FALSE;
00637 
00638   // deallocate the prototype statistics and then the prototype itself
00639   if (Prototype->Distrib != NULL)
00640     memfree (Prototype->Distrib);
00641   if (Prototype->Mean != NULL)
00642     memfree (Prototype->Mean);
00643   if (Prototype->Style != spherical) {
00644     if (Prototype->Variance.Elliptical != NULL)
00645       memfree (Prototype->Variance.Elliptical);
00646     if (Prototype->Magnitude.Elliptical != NULL)
00647       memfree (Prototype->Magnitude.Elliptical);
00648     if (Prototype->Weight.Elliptical != NULL)
00649       memfree (Prototype->Weight.Elliptical);
00650   }
00651   memfree(Prototype);
00652 }                                // FreePrototype

CLUSTERER* MakeClusterer ( INT16  SampleSize,
PARAM_DESC  ParamDesc[] 
)

Creates a new clusterer data structure, initializes it, and returns a pointer to it.

Parameters:
SampleSize number of dimensions in feature space
ParamDesc description of each dimension
Returns:
pointer to the new clusterer data structure
Note:
Exceptions: None
Date:
5/29/89, DSJ, Created.

Definition at line 442 of file cluster.cpp.

References PARAM_DESC::Circular, Emalloc(), PARAM_DESC::HalfRange, CLUSTERER::KDTree, MakeKDTree(), PARAM_DESC::Max, PARAM_DESC::MidRange, PARAM_DESC::Min, PARAM_DESC::NonEssential, NULL, CLUSTERER::NumberOfSamples, CLUSTERER::NumChar, CLUSTERER::ParamDesc, CLUSTERER::ProtoList, PARAM_DESC::Range, CLUSTERER::Root, and CLUSTERER::SampleSize.

Referenced by SetUpForClustering().

00442                                                          {
00443   CLUSTERER *Clusterer;
00444   int i;
00445 
00446   // allocate main clusterer data structure and init simple fields
00447   Clusterer = (CLUSTERER *) Emalloc (sizeof (CLUSTERER));
00448   Clusterer->SampleSize = SampleSize;
00449   Clusterer->NumberOfSamples = 0;
00450   Clusterer->NumChar = 0;
00451 
00452   // init fields which will not be used initially
00453   Clusterer->Root = NULL;
00454   Clusterer->ProtoList = NIL;
00455 
00456   // maintain a copy of param descriptors in the clusterer data structure
00457   Clusterer->ParamDesc =
00458     (PARAM_DESC *) Emalloc (SampleSize * sizeof (PARAM_DESC));
00459   for (i = 0; i < SampleSize; i++) {
00460     Clusterer->ParamDesc[i].Circular = ParamDesc[i].Circular;
00461     Clusterer->ParamDesc[i].NonEssential = ParamDesc[i].NonEssential;
00462     Clusterer->ParamDesc[i].Min = ParamDesc[i].Min;
00463     Clusterer->ParamDesc[i].Max = ParamDesc[i].Max;
00464     Clusterer->ParamDesc[i].Range = ParamDesc[i].Max - ParamDesc[i].Min;
00465     Clusterer->ParamDesc[i].HalfRange = Clusterer->ParamDesc[i].Range / 2;
00466     Clusterer->ParamDesc[i].MidRange =
00467       (ParamDesc[i].Max + ParamDesc[i].Min) / 2;
00468   }
00469 
00470   // allocate a kd tree to hold the samples
00471   Clusterer->KDTree = MakeKDTree (SampleSize, ParamDesc);
00472 
00473   // execute hook for monitoring clustering operation
00474   // (*ClustererCreationHook)( Clusterer );
00475 
00476   return (Clusterer);
00477 }                                // MakeClusterer

SAMPLE* MakeSample ( CLUSTERER Clusterer,
FLOAT32  Feature[],
INT32  CharID 
)

Creates new sample data structure for specified feature.

Parameters:
Clusterer clusterer data structure to add sample to
Feature feature to be added to clusterer
CharID unique ident. of char that sample came from
Returns:
Pointer to the new sample data structure
Creates a new sample data structure to hold the specified feature. This sample is added to the clusterer data structure (so that it knows which samples are to be clustered later), and a pointer to the sample is returned to the caller.

Exceptions: ALREADYCLUSTERED MakeSample can't be called after ClusterSamples has been called

Date:
5/29/89, DSJ, Created.

Definition at line 498 of file cluster.cpp.

References ALREADYCLUSTERED, sample::CharID, sample::Clustered, DoError(), Emalloc(), FALSE, KDStore(), CLUSTERER::KDTree, sample::Left, sample::Mean, NULL, CLUSTERER::NumberOfSamples, CLUSTERER::NumChar, sample::Prototype, sample::Right, CLUSTERER::Root, sample::SampleCount, and CLUSTERER::SampleSize.

Referenced by SetUpForClustering().

00498                                                                     {
00499   SAMPLE *Sample;
00500   int i;
00501 
00502   // see if the samples have already been clustered - if so trap an error
00503   if (Clusterer->Root != NULL)
00504     DoError (ALREADYCLUSTERED,
00505       "Can't add samples after they have been clustered");
00506 
00507   // allocate the new sample and initialize it
00508   Sample = (SAMPLE *) Emalloc (sizeof (SAMPLE) +
00509     (Clusterer->SampleSize -
00510     1) * sizeof (FLOAT32));
00511   Sample->Clustered = FALSE;
00512   Sample->Prototype = FALSE;
00513   Sample->SampleCount = 1;
00514   Sample->Left = NULL;
00515   Sample->Right = NULL;
00516   Sample->CharID = CharID;
00517 
00518   for (i = 0; i < Clusterer->SampleSize; i++)
00519     Sample->Mean[i] = Feature[i];
00520 
00521   // add the sample to the KD tree - keep track of the total # of samples
00522   Clusterer->NumberOfSamples++;
00523   KDStore (Clusterer->KDTree, Sample->Mean, (char *) Sample);
00524   if (CharID >= Clusterer->NumChar)
00525     Clusterer->NumChar = CharID + 1;
00526 
00527   // execute hook for monitoring clustering operation
00528   // (*SampleCreationHook)( Sample );
00529 
00530   return (Sample);
00531 }                                // MakeSample

FLOAT32 Mean ( PROTOTYPE Proto,
UINT16  Dimension 
)

Computes the mean of the specified prototype in the indicated dimension.

Parameters:
Proto prototype to return mean of
Dimension dimension whose mean is to be returned
Returns:
Mean of Prototype in Dimension
Note:
Exceptions: none
Date:
7/6/89, DSJ, Created.

Definition at line 698 of file cluster.cpp.

References proto::Mean.

Referenced by PrintNormMatch(), and UniformCertainties().

00698                                                  {
00699   return (Proto->Mean[Dimension]);
00700 }                                // Mean

CLUSTER* NextSample ( LIST SearchState  ) 

Finds all of the samples which belong to a cluster.

Parameters:
SearchState ptr to list containing clusters to be searched
Returns:
Pointer to the next leaf cluster (sample) or NULL.
Starts by removing the top cluster on the cluster list (SearchState). If this cluster is a leaf it is returned. Otherwise, the right subcluster is pushed on the list and we continue the search in the left subcluster. This continues until a leaf is found. If all samples have been found, NULL is returned.

See also:
InitSampleSearch() must be called before

NextSample() to initialize the search.

Note:
Exceptions: None
Date:
6/16/89, DSJ, Created.

Definition at line 672 of file cluster.cpp.

References first, sample::Left, NULL, pop(), push(), and TRUE.

Referenced by ComputeStatistics(), FillBuckets(), and MultipleCharSamples().

00672                                        {
00673   CLUSTER *Cluster;
00674 
00675   if (*SearchState == NIL)
00676     return (NULL);
00677   Cluster = (CLUSTER *) first (*SearchState);
00678   *SearchState = pop (*SearchState);
00679   while (TRUE) {
00680     if (Cluster->Left == NULL)
00681       return (Cluster);
00682     *SearchState = push (*SearchState, Cluster->Right);
00683     Cluster = Cluster->Left;
00684   }
00685 }                                // NextSample

FLOAT32 StandardDeviation ( PROTOTYPE Proto,
UINT16  Dimension 
)

Computes standard deviation of prototype in the indicated dimension.

Parameters:
Proto prototype to return standard deviation of
Dimension dimension whose stddev is to be returned
Returns:
Standard deviation of Prototype in Dimension
Note:
Exceptions: none
Date:
7/6/89, DSJ, Created.

Definition at line 712 of file cluster.cpp.

References D_random, proto::Distrib, FLOATUNION::Elliptical, elliptical, mixed, normal, FLOATUNION::Spherical, spherical, proto::Style, uniform, and proto::Variance.

Referenced by PrintNormMatch().

00712                                                               {
00713   switch (Proto->Style) {
00714     case spherical:
00715       return ((FLOAT32) sqrt ((double) Proto->Variance.Spherical));
00716     case elliptical:
00717       return ((FLOAT32)
00718         sqrt ((double) Proto->Variance.Elliptical[Dimension]));
00719     case mixed:
00720       switch (Proto->Distrib[Dimension]) {
00721         case normal:
00722           return ((FLOAT32)
00723             sqrt ((double) Proto->Variance.Elliptical[Dimension]));
00724         case uniform:
00725         case D_random:
00726           return (Proto->Variance.Elliptical[Dimension]);
00727       }
00728   }
00729   return 0.0f;
00730 }                                // StandardDeviation


Generated on Wed Feb 28 19:49:18 2007 for Tesseract by  doxygen 1.5.1