The Term Harvest Expert generates terminology databases by analysing and identifying terms from within Alchemy projects.  Through a combination of statistical methods, content examination and optional manual override, the process is very accurate.

Projects containing user interface segments are typically rich in term candidates.  That said, it is likely that there are also phrases and segments in the project that are not valid terms.  The job of the Term Harvest Expert is to separate the wheat from the chaff.  The Expert takes a statistical approach to identifying terminology and generating a database with those terms .

note.bmp

Note: Term Harvest generates *.tbx or *.csv files containing terminology.

The expert starts with a complete list of segments from the Term Candidate files.  It then searches these segments for sub-segment phrases based on delimiters (customisable in the Options tab), thus increasing  the pool of possible terms.  With this increased set of candidates, the Term Harvest Expert then searches the frequency analysis files for uses of the candidates.  Each use of a candidate increases the likelihood of it being a term.  

Press the Analyse Candidates button in the Candidates Tab to view this frequency analysis along with other statistics about each candidate.  The resulting grid is used to accurately identify terms.

To launch the Expert, choose it from the Tools menu.

Files Tab

Use the Files Tab to supply input files and specify the desired output.

Input Files

The process takes a project file that is likely to contain terms.  Use the Term Candidate Files area to select the file or files that contain the terms.  If the supplied candidate files are untranslated, the output will be untranslated.  If the supplied candidate files are translated, the resulting terminology database will be bilingual.

Use the Frequency Analysis Files area to select the file or files where the terms are likely to be used.  The expert takes any terminology identified in the candidate files and searches to see how frequently it is used in this content.  The more frequently a term candidate is mentioned in these reference files, the more likely it is to be a valid candidate.

Output Files

trem_harvest_output.jpg

Use the Output Files area to specify the location for the resulting termbase.  Choose between a termbase exchange file (*.tbx) or a comma separated file (*.csv).

The Term Harvest Expert will optionally store those segments and terms that were deemed excluded by the Expert.  To generate a file containing the excluded terms, select the Export Excluded Terms File check-box.  The filename will be automatically generated based on the terminology database file.  The word _excluded is appended to the file name in this case.

The options used to generate the terminology database may also be saved by the Term Harvest Expert.  Select the Export User Settings File option to store the settings used during the term analysis.  The filename is automatically generated based on the terminology database filename - the word _settings is appended and the file has an *.ini extension.

Options Tab

Use the Options Tab to control what content types are considered when searching for term candidates.  User interface content such as dialog-boxes and menu resources are rich in term candidates, however, all resource types can be considered.  It is better to include everything and rely on the controls in the candidates tab to accurately identify the terms.

When looking to maximise the pool of possible term candidates, the Term Harvest Expert considers full segments, but also searches sub-segment for possible terms. It does this by looking for delimiters within segments that may denote actual terms.  Use the Candidate Delimiters control in the Options Tab to edit the list of candidate delimiters.  In the following example, braces () would make good candidate delimiters as they surround the term 'Translation Memory'.  e.g. "Please select your TM (Translation Memory)"

 

Candidates Tab

The Candidates tab facilitates accurate identification of terminology from the term candidate list.  See more information.

term_harvest_candidates_grid.jpg