With valid Term Candidate and Frequency Analysis files provided in the Files tab, pressing Analyse Candidates on the Candidates tab results in a grid such as that below.

Potential terms are listed in the Source column along with their translation, if present, and the necessary statistical information to aid in identifying terminology.  If the Term Candidate file provided in the Files tab is a translated project, then the Translation column will display translations.  This is true for all complete segments listed. The Term Harvest Expert considers complete segments, but also searches for Candidate Delimiters within each segment with a view to maximizing the list of possible terms. When terms are found within a segment in this way, the translation is not presented as the expert has no way of identifying which part of the translation matches the term identified within the source segment.

term_harvest_candidates_grid.jpg

Identifying Terminology

Using the grid select those term candidates that should be included in the termbase.  The segments to be included in the termbase are shown in green, while the rest appear grey.  

The State column indicates the manual/automatic status of the terms.  Each term candidate has a tri-state button.  You can force a term to be included by clicking this button once.  The button is shown with a green tick mark.  Click again to force a segment's exclusion - in which case it appears with a red X.  Click a third time to return the segment to automatic status.  Automatic status is denoted by the double arrow symbol and means its include/exclude status is governed by the statistical and content settings below.

term_harvest_auto_grid.jpg

Statistical Control

The controls in this section relates to numerical qualities of term candidates.  Each element of control can be enabled/disabled with the check-box to the left.  As each control is made active/inactive, its effect can be seen in the summary above the grid (i.e. Included Terms 294 of 2328).

term_harvest_statistics.jpg

To control a term candidate's inclusion based on the level of its use, enable the Required Frequency control.   If a term is found in the frequency analysis files less times than the set threshold then it is deemed excluded and appears grey.  Assuming the frequency analysis file relates to the terms in the term candidate list, this is a great way to identify actual terms.  

To control a term candidate's inclusion based on the number of words in the term, enable the Maximum Word Count control.   If a term candidate has more words than the set limit, than it is deemed excluded and appears grey.  The word count limit is useful for excluding candidates that are more likely to be sentences or phrases than terms.

To exclude term candidates that are too short, enable the Minimum Character Count control.   If a term shorter than the provided threshold, it is deemed excluded and appears grey.  This is useful for excluding very short words such as 'in', 'on', 'at', 'the', etc.

To exclude terms that contain very long words, enable the Maximum Characters in a Word control.  When dealing with software user interface content, this option can be useful for excluding candidates that relate to the working of an application such as variable names (e.g. maximumSizeValue).

Content Control

The controls in this section relates to the content of the term candidates.  Each element of control can be enabled/disabled with the check-box to the left.  As each control is made active/inactive, its effect can be seen in the summary above the grid (i.e. Included Terms 294 of 2328)

term_harvest_content.jpg

Use the Exclude Segments with only lower-case to exclude segments that don't appear to be terms due to the lack of upper case letters.  Valid terms are more likely to contain a mixture of upper and lower case letters (e.g. Leverage Expert).

Use the Exclude Segments with only UPPER-CASE to remove those terms that contain all upper case letters. Valid terms are more likely to contain a mixture of upper and lower case letters (e.g. Leverage Expert).

The Exclude Segments containing control facilitates excluding terms based on any specified characters.  Characters such as =, +, ~ and # can exist in term candidates but are likely to mean it is not a valid term.  Extend this list within the edit field and exclude terms containing these characters.  Press the Apply button to see the effect of the changes.

The Exclude Accelerator Text is used to ensure terms such as Ctrl+A, Ctrl+Shift+T are excluded automatically.  

Advanced Override Settings

The Advanced Override Settings area facilitates ultimate control by using regular expression to override the above settings and include or exclude term candidates as the user sees fit.

term_harvest_advanced.jpg