Show simple item record

dc.contributor.authorVilares, Jesús
dc.contributor.authorVilares, Manuel
dc.contributor.authorAlonso, Miguel A.
dc.contributor.authorOakes, Michael P.
dc.date.accessioned2015-09-18
dc.date.available2017-11-29T11:39:41Z
dc.date.issued2015-10-01
dc.identifier.citationOn the feasibility of character n-grams pseudo-translation for Cross-Language Information Retrieval tasks 2016, 36:136 Computer Speech & Language
dc.identifier.issn0885-2308
dc.identifier.doi10.1016/j.csl.2015.09.004
dc.identifier.urihttp://hdl.handle.net/2436/620920
dc.description.abstractThe field of Cross-Language Information Retrieval relates techniques close to both the Machine Translation and Information Retrieval fields, although in a context involving characteristics of its own. The present study looks to widen our knowledge about the effectiveness and applicability to that field of non-classical translation mechanisms that work at character n-gram level. For the purpose of this study, an n-gram based system of this type has been developed. This system requires only a bilingual machine-readable dictionary of n-grams, automatically generated from parallel corpora, which serves to translate queries previously n-grammed in the source language. n-Gramming is then used as an approximate string matching technique to perform monolingual text retrieval on the set of n-grammed documents in the target language. The tests for this work have been performed on CLEF collections for seven European languages, taking English as the target language. After an initial tuning phase in order to analyze the most effective way for its application, the results obtained, close to the upper baseline, not only confirm the consistency across languages of this kind of character n-gram based approaches, but also constitute a further proof of their validity and applicability, these not being tied to a given implementation.
dc.language.isoen
dc.relation.urlhttp://linkinghub.elsevier.com/retrieve/pii/S0885230815000935
dc.subjectCross-Language Information Retrieval
dc.subjectCharacter n-grams
dc.subjectAlignment Algorithms for Machine Translation
dc.titleOn the feasibility of character n-grams pseudo-translation for Cross-Language Information Retrieval tasks
dc.typeJournal article
dc.identifier.journalComputer Speech & Language


Files in this item

Thumbnail
Name:
Publisher version

This item appears in the following Collection(s)

Show simple item record