Skip to content

Cross-language Search: What is it all about?

The term “cross-language search” is used in many different senses:

1. Some search engine providers claim to support multilingual or cross-language search if they can handle and index documents written in different languages. They search for the exact appearance of the entered search terms, e.g. “war” finds English documents referring to military actions and it finds German documents containing “war” in the sense of “was” (i.e. a meaningless glue word).

2. Other search engines (see, e.g., www.google.com/intl/en/press/annc/translate_20070523.html) provide a tool for the translation of a query into a selectable other language, and then, the query is submitted with the translated query text. This is certainly a progress and can be useful in some specific situations, e.g. if one is looking for a hairdresser in Paris.

Shortcomings:
- If one is looking for “member of the board” and “SAir Group” (Swissair) and searches for German documents, the translated query “Mitglied des Brettes” und “SAir Gruppe” won’t provide any results. If “member of the board” is replaced by “Aufsichtsrat” some documents are found but they do not correspond to the commonly used terms “Verwaltungsrat” or “Verwaltungsräte” in conjunction with the Sair case.
- For information research and intelligence services the above-mentioned method does not help because it is not able to compare and rank documents written in different languages.

3. A true cross-language search is possible only if the search engine is able to recognize the thematic content, i.e., if the system realizes that the English translation of a French (or a German etc.) document is equivalent to the original document. This advanced technique is implemented in InfoCodex (see www.infocodex.com). It simultaneously finds documents in all supported languages, without the need for a cumbersome (and arbitrary) translation into each other language. Because of the cross-language content recognition and a well-founded similarity measure, the documents can be ordered by their relevance with respect to the query.

Post a Comment

You must be logged in to post a comment.