By I. Dan Melamed
Parallel texts (bitexts) are a goldmine of linguistic wisdom, as the translation of a textual content into one other language might be seen as a close annotation of what that textual content capability. wisdom approximately translational equivalence, which are gleaned from bitexts, is of vital significance for purposes resembling handbook and desktop translation, cross-language info retrieval, and corpus linguistics. the supply of bitexts has elevated dramatically because the introduction of the net, making their examine an exhilarating new region of study in common language processing. This booklet lays out the idea and the sensible options for locating and utilizing translational equivalence on the lexical point. it's a start-to-finish advisor to designing and comparing many translingual applications.
Read or Download Empirical Methods for Exploiting Parallel Texts PDF
Best ai & machine learning books
Man made Intelligence via Prolog e-book
As a pioneer in computational linguistics, operating within the earliest days of language processing by means of machine, Margaret Masterman believed that that means, no longer grammar, was once the main to realizing languages, and that machines might confirm the that means of sentences. This quantity brings jointly Masterman's groundbreaking papers for the 1st time, demonstrating the significance of her paintings within the philosophy of technological know-how and the character of iconic languages.
This research explores the layout and alertness of usual language text-based processing structures, in line with generative linguistics, empirical copus research, and synthetic neural networks. It emphasizes the sensible instruments to deal with the chosen method
Extra info for Empirical Methods for Exploiting Parallel Texts
The translation lexicons and stop-lists had been developed independently of the training and test bitexts. The French/English part of the evaluation was performed on bitexts from the publicly available corpus de bi-texte anglais-fran¸cais (BAF) (Simard & Plamondon, 1996). 4. This distribution can be compared to the error distributions reported for the same test set by Dagan et al. (1993b). Dagan et al. ” These distances were measured horizontally from the bitext map rather than perpendicularly to the main diagonal.
7 Implementation of SIMR for New Language Pairs SIMR can be ported to a new language pair in three steps. 1 Step 1: Construct Matching Predicate The original SIMR implementation for French/English included matching predicates that could use cognates and/or translation lexicons. For language pairs in which lexical cognates are frequent, a cognate-based matching predicate should suffice. In other cases, SIMR can use a seed translation lexicon to boost the number of candidate points of correspondence produced in the generation phase.
These examples suggest that a more accurate cognate matching criterion can be driven by approximate string matching. For example, McEnery & Oakes (1995) threshold the Dice coefficient of matching character bigrams in each pair of candidate cognates. The matching predicates in SIMR’s current implementation threshold the Longest Common Subsequence Ratio (LCSR). The LCSR of two tokens is the ratio of the length of their longest (not necessarily contiguous) common subsequence (LCS) and the length of the longer token.