By Jian-Yun Nie, Graeme Hirst
Look for info is not any longer solely constrained in the local language of the consumer, yet is increasingly more prolonged to different languages. this provides upward push to the matter of cross-language details retrieval (CLIR), whose target is to discover proper info written in a special language to a question. as well as the issues of monolingual info retrieval (IR), translation is the main challenge in CLIR: one may still translate both the question or the records from a language to a different. notwithstanding, this translation challenge isn't similar to full-text computer translation (MT): the objective isn't really to provide a human-readable translation, yet a translation appropriate for locating suitable records. particular translation tools are therefore required. The target of this booklet is to supply a accomplished description of the specifi c difficulties bobbing up in CLIR, the ideas proposed during this quarter, in addition to the remainder difficulties. The ebook begins with a normal description of the monolingual IR and CLIR difficulties. assorted sessions of techniques to translation are then provided: techniques utilizing an MT method, dictionary-based translation and ways in response to parallel and similar corpora. moreover, the common retrieval effectiveness utilizing diverse techniques is in comparison. will probably be proven that translation methods in particular designed for CLIR can rival and outperform top quality MT platforms. ultimately, the ebook deals a glance into the longer term that pulls a powerful parallel among question enlargement in monolingual IR and question translation in CLIR, suggesting that many ways constructed in monolingual IR will be tailored to CLIR. The e-book can be utilized as an advent to CLIR. complicated readers may also locate extra technical info and discussions in regards to the closing learn demanding situations sooner or later. it's appropriate to new researchers who intend to hold out examine on CLIR.
Read Online or Download Cross-language Information Retrieval (Synthesis Lectures on Human Language Technologies) PDF
Best ai & machine learning books
Synthetic Intelligence via Prolog e-book
As a pioneer in computational linguistics, operating within the earliest days of language processing via computing device, Margaret Masterman believed that which means, now not grammar, was once the main to figuring out languages, and that machines may perhaps be sure the that means of sentences. This quantity brings jointly Masterman's groundbreaking papers for the 1st time, demonstrating the significance of her paintings within the philosophy of technology and the character of iconic languages.
This examine explores the layout and alertness of normal language text-based processing structures, in keeping with generative linguistics, empirical copus research, and synthetic neural networks. It emphasizes the sensible instruments to house the chosen approach
Extra resources for Cross-language Information Retrieval (Synthesis Lectures on Human Language Technologies)
Readers may refer to Wong et al. (2009) for a more detailed description on Chinese processing. 1 Chinese and Word Segmentation Chinese texts are written in ideograms (also called Chinese characters or ideographs). One of the distinct characteristics of Chinese (compared to the Indo-European languages) for IR purposes is the absence of space to delimit words. ) One would desire to recognize the following words in this sentence: 汶川 (Wenchuan), 地震 (earthquake), 灾区 (disaster area), 首批 (first batch), 自建 (self-constructed), 永久性 (permanent), 农房 (house of farmer), 建成 (constructed), 入住 (inhabited).
The representation problem is even more evident in cross-language information retrieval (CLIR) or multi-lingual information retrieval (MLIR), where queries and documents are described in different languages. How can we create the same or similar internal representation for them when they concern the same piece of information, but written in different languages? For example, how can we recognize that the following descriptions describe the same piece of information? There is a major earthquake in Wenchuan, China in 2008 (in English).
1. The translation words for “drug” are underlined in the examples. We also indicate whether the translation is correct. In some cases, the translation is indicated as “possible” because the original query is ambiguous even for a human being. Thus several translations are possible. The translation of the ambiguous word “drug” is extremely difficult for MT systems. As we can see, in some cases, the correct translation is chosen, and in some other cases, the incorrect translation is chosen. Let us analyze these examples in more detail.