翻訳練習　38

「急速に進化した機械翻訳」に、それでもできない３つのこと

翻訳者には、失業の心配はありません

西田宗千佳

フリージャーナリスト

ここ2年で、機械翻訳の精度が大幅に改善している。

現時点でも完全にはほど遠いが、以前は文章の体をなしていなかったものが、かなり意味の通る文になってきた。我々が海外から受け取るメールやアプリの説明書きなども、機械翻訳されたものが多くなってきている。必ずしも自然な日本語ではないが、とりあえず「読める」「わかる」ものになり、実用的になったからこそ利用が進んでいるのだろう。

また、カメラで撮影した画像の中に含まれる言葉を機械翻訳してくれるアプリも登場している。

画像中に含まれる文字情報だけを抽出して、多言語に翻訳してくれるもので、旅行中に看板などを撮影して、そこに書かれている内容を把握する際などにとても便利なサービスだ。音声を聞き取って、翻訳したうえで自国語で話してくれる「ほんやくコンニャク」のようなアプリも登場している。かつて憧れたドラえもんの道具が、形を変えて実現しつつあるといえよう。

このような変化の背景にあるのは「ディープラーニング（深層学習）」である。ご存じのとおり、昨今は「AI」（人工知能）として、ひと言でまとめられてしまうことも多い概念だ。

ディープラーニングの登場で、機械翻訳はどう変化したのか？　何が起きていて、どんな限界があるのか？　あらためて読み解いてみよう。

急速に進化する「ディープラーニング型」機械翻訳

AIという言葉は、決して新しいものではない。

コンピュータの歴史は「人工知能」探求の歴史でもあり、これまでにも何度かAIブームが起きてきたが、現在のブームは「ディープラーニングのブーム」といっても過言ではない。

ディープラーニングがどういう技術で、機械翻訳にどんな役割を果たしているのか？　それは、言葉の「意味」をどう考えるか、という点で表すことができる。過去との比較で語ったほうがわかりやすいだろう。

機械翻訳ではもともと、文章や単語のもつ具体的な意味にはいっさい踏み込んでいなかった。用例をもとに読み替え、翻訳していく。「今日は晴れです＝Today is sunny day」といった相互に対応する文例を多数用意し、逐一置き換えを行う「辞書的翻訳」だったのだ。それはまさに“機械”的な翻訳であり、置き換えの情報をひたすらつくっていく必要があることから、翻訳の精度を上げるのが困難だった。

そこで登場したのが「統計的機械翻訳」だ。

大量に文例を集めたうえで各単語を記号化し、統計処理に基づいて機械的に処理していくことで、翻訳のためのデータベースができ上がる……というしくみだ。単語はあくまで記号として扱われるため、どんな意味の文書が処理されているのか、ソフトウエア側はまったく把握していない。

そこに大きな変化が起きた。きっかけとなったのは、2014年にグーグルが発表した「自動翻訳にディープラーニングを活用する」という論文である。この論文をもとに、2年ほど前から、ネット上の機械翻訳サービスは、ディープラーニングを使ったものへと切り替えが進んでいる。

ディープラーニングは「言葉の意味」を学習しない

ディープラーニングは現在、俗に「AI」とよばれる技術の中核となっている考え方だ。「大量の情報と答えの例から人間が学ぶのに近いやり方でひたすら学習して、ルールを自動的につくる」方法、と説明することができる。その結果、従来はルール化が難しかった、非常にあいまいなものを判断するソフト開発に向いている。

画像認識や音声認識にも活用され、「猫を見分ける」「人の顔を見分ける」「声を認識する」といった処理の精度が劇的に向上している。冒頭で、カメラの画像から文字の部分だけを翻訳する例を紹介したが、これも、画像認識・文字認識の能力が、ディープラーニングの導入で格段に向上したことによる成果だ。

こうしたしくみを自動翻訳におけるルール作りに使ったのが、現在主流となりつつある「ディープラーニングによる自動翻訳」である。とはいえ、このディープラーニングにおいても、「AI」という言葉から想起されるような「意味の解釈」は行われていない。

だが、「意味をふまえる」ことには一歩踏み込んでいるのが特徴だ。ディープラーニング・ベースでの機械翻訳では、語感や語順など、文章の流れも考慮したうえで「どういう空間にどういう情報とともに配置されるか」を重視している。

たとえば「匙（さじ）」と「スプーン」は、ディープラーニングによる学習の結果、近い空間に、似た情報をもって存在するのだという。だから「両者は似たような意味である」と判断されて、翻訳に使われる。翻訳対象となる文章が多いほど、そうした判定の精度が向上しやすいため、文章を見ると「意味をふまえて、従来よりも自然な文章ができ上がっているように見える」のだという。

これが、ディープラーニングで機械翻訳の精度が上がった理由である。

各社は現在、機械翻訳用のエンジンをディープラーニング・ベースに置き換えているが、それにもやはり理由がある。各社がディープラーニングを支持しているのは「まだ伸びしろが大きい」からなのだ。

統計的機械学習などの過去の手法は、おおむね20年にわたって研究されてきた。各社の評価として、統計的機械学習による翻訳精度の向上は「踊り場状態」にあり、近い将来に劇的な成長を見込める状態にはない。

だが、ディープラーニングによる翻訳は、論文の発表から実用化まで、わずか2年ほどしか研究されていないにもかかわらず、すでに統計的機械学習による翻訳の精度を超えている。しかも、まだまだ向上の余地がある。そのような将来性に対する評価から、各社はいっせいにディープラーニング・ベースへと舵を切ったのである。

「ニュアンス」「非言語」に踏み込めない機械翻訳

では、機械翻訳は今後、どこまで精度を上げられるのか？

「おそらくすぐに、中学生レベルの英語は不要になるだろう」と、研究者の意見は一致している。また、記述のルールが定まった文書、たとえば特許文書や法律、論文といったものであれば、翻訳の精度は相当に高くなっていくだろう。書く側が「機械翻訳に配慮」し、文学的な（少々あいまいなニュアンスを含む、と言い換えてもいい）表現を使わずに書かれた文章も、翻訳精度は高くなる。

その先にはいずれ、翻訳者が不要になる時代が来るのだろうか？

研究者の間では、「そうはならないだろう」という意見が共通の見解となっている。

理由は複数ある。

機械翻訳が役に立たない、と関係者の意見が一致するのが「小説」「シナリオ」の類いだ。なぜこれらのコンテンツの翻訳に使えないかというと、ディープラーニングでは単語の大まかな意味は扱っていても、「文章の意味」そのものを解釈して翻訳しているわけではないからだ。

「おまえはバカだ」という同じ文章でも、その意味するところは、前後の文脈で大きく異なる。そのニュアンスをとらえて文章全体を翻訳するのは、現在のディープラーニング機械翻訳では難しいのだ。

機械学習ならではの問題もある。たとえば、ある文章中に登場する女性であるはずの人物を「彼」と誤訳する例があったという。間違えた理由は、その人物の職業には一般に男性が就くことが多いことから、サンプルとなった文章から「その職業の場合には男性形で訳す」という誤った学習をしてしまっていたからだ。

学習の結果はつねに正しいとは限らず、そのような部分を指摘して修正するのは当然、人間の仕事になる。ディープラーニングの「誤まった学習結果」の洗い出しは今後、大きな課題になるはずだ。

もうひとつ、本質的な問題がある。

たとえば現在の機械翻訳では、同時通訳の精度は高くない。まず音声をテキスト化し、その後に翻訳するという流れになるため、「テキスト化」と「翻訳」という2つの関門があり、そもそも不利であるのは否めない。しかし、ディープラーニングによる音声認識の登場により、「テキスト化」の精度は格段に上がっている。問題は、「テキストから翻訳する」ことそのものに起因している。

人の会話内容（話し言葉）は、文章（書き言葉）に比べて非常に粗い。言葉が抜け落ちたり、語順がおかしかったり言いよどんだり、あらためてよく考えると意味が通じていなかったりもしている。人間どうしの会話なら、そこからうまく意味を汲み取ることができる。それに近いことができるAIが登場しなければ、同時通訳の精度は決して上がらない。

また、同じコミュニケーションであっても、「テキストだけ」の場合と、「音声での通話」、さらに「実際に対面しての会話」とでは、誤解が生じる可能性がまったく違ってくる。人間は、非言語でのコミュニケーションも活用しており、特に同時通訳者は、そうしたニュアンスも汲み取って翻訳する。

たとえば英語圏の人物が、ある単語を話しながら両手の人差し指、中指、親指で挟み込むようなジェスチャーをしたら、クオーテーションマーク(“　”)で単語を括った、というニュアンスがあることから、即座に「いわゆる」という言葉に置き換えるというが、テキスト情報だけに頼る機械翻訳では、そうした部分のカバーは難しい。

声優の名演も台無しに

先ほどの、「おまえはバカだ」というフレーズにしても、「おまえは……、バカだ！」というふうに言った場合と、「おまえは、バ・カ・だ」と愛嬌のある声で言ったときとでは、まったくニュアンスが異なる。たとえば映画の吹き替えでは、当然それらのニュアンスの違いを把握したうえで台本がつくられている。だが、声優によるせっかくの演技も、機械翻訳では平板な言葉に置き換えられてしまうだろう。

「言葉」はそもそも、それが使われている国の文化に根ざしている。同じ言葉・会話であっても、国によって実際に人が受け取るニュアンスが違うことも多いが、現在の機械翻訳は、そのようなニュアンスを誤訳しやすい傾向にある。

なぜなら、現状の翻訳エンジンは、各国の文化的背景までは考慮に入れていないからである。こうした要素は、実は人間でも誤訳に結びつきやすい一因なのだが、それは機械翻訳でも変わらない。

翻訳の精度が上がってきたことで、あらためて気づかされるのは、人間はいかに多彩な情報を用いて、複数の角度から「言葉」を使っているか、ということだ。現在の機械翻訳は、どうしても「目の前のテキストの内容を解釈しての翻訳」に特化している。だからこそ、細かなニュアンスを必要とする翻訳は苦手なのである。

ここで列挙したような問題も、いずれは技術のさらなる進展によって解決されるのかもしれない。しかし、少なくとも当面は無理だ。したがって現時点では、機械翻訳は「カジュアルに、多少の間違いを許容しつつ使う」、あるいは「企業内での下訳に使う」のが現実的な利用法だろう。

限定的な使用環境であることは否めないが、それでも機械翻訳は、人間に比べて圧倒的にコスト効率が高く、コミュニケーションの助けになることも間違いない。東京五輪を控え、海外からの渡航者が増加傾向にある今の日本で、「会話の糸口くらいはつかめる」ことのメリットは十分にある。

機械翻訳はあくまで「ツール」だ。ツールが便利かどうかは結局、使う側が決めることである。やがて我々の文化や言葉のニュアンスまで読みとるAIが登場してくれば、興味深い「パートナー」になりうるだろうが、それはまだ、かなり先の出来事である。

https://gendai.ismedia.jp/articles/-/55237?page=1

Machine translation has evolved rapidly...but it still can't do these three things

There's no need for translators to fear for their jobs just yet

By Nishida Munechika, freelance journalist

The accuracy of machine translation has improved drastically over the last two years. Although it's far from perfect, our current applications can still create meaningful sentences, even if they sound disjointed. Many of the emails and app user manuals that arrive from overseas have gone through machine translation. Okay, so it's not the most natural sounding Japanese, but it can be read and understood, and it's this practicality that has led to ongoing use of machine translation. We've also seen the introduction of apps that use the camera function to read and translate text. These services can select just the text elements within an image and translate them into many languages. So while on holiday, you can take a snap of a sign and have everything written on it translated at your leisure. Some apps, like Honyaku Konnyaku, listen to what is being said, then voice a translation back to you in your native language - it's like Doraemon's gadgets becoming reality, in an albeit different form. Behind these changes is something called 'deep learning.' It's a concept that is often expressed in one acronym: AI, or artificial intelligence. So how has machine translation changed with the addition of deep learning? What's it all about and how much can it do?

The rapid evolution of Neural Machine Translation

'AI' is not a new phrase. The history of computers really is a history of seeking an artificial intelligence, and there have been many episodes of concentrated interest in AI before now. It's not an exaggeration to say the current AI boom is around deep learning. What kind of technology is deep learning, and what role does it play in machine translation? To answer this, we need to think about the meanings of words, and how things were in the past. Originally, machine translation couldn't get to grips with all the concrete meanings of words and phrases. It processed and translated texts based on examples - it knew that, for example, '今日は晴れです' meant 'Today is sunny.' It was a 'dictionary translation': prepared with mutually compatible example sentences it substituted for the original language one by one. As it truly was mechanical translation, replacing one piece of information, or sentence, for another, it was difficult to improve its accuracy. This is where Statistical Machine Translation (SMT) came in. This processed texts mechanically based on statistical calculations, using a translation database holding huge amounts of example sentences and encoded vocabulary. Because words are treated like data points, the software has no grasp of the meaning of a document. That's where a big change happened. It started with a paper published by Google in 2014, called 'Application of deep learning in automatic translation.' That then became the basis of online machine translation services starting the switch to systems using deep learning, or Neural Machine Translation (NMT) about two years ago.

Deep learning doesn't learn what words mean

Deep learning is currently the technology at the core of what we call AI. It learns in a similar way to us – it creates rules automatically by learning from from vast amounts of information and example answers. As a result, it is suitable for developing software that has traditionally had issues with establishing rules and needs to make judgements on incredibly ambiguous data. It is used in both image and voice recognition, and has dramatically improved accuracy in, for example, distinguishing between human faces, recognize that something is a cat, and recognizing a voice. The introduction of deep learning has resulted in remarkable improvements in the accuracy of image and text recognition software – software that is used by apps like those that recognize and translate text from camera images that we mentioned earlier.

Neural Machine Translation uses these kind of processes to create rules in automatic translation and its use is becoming more mainstream. While deep learning does not perform any kind of interpretation of meaning, which you might expect of an artificial intelligence, it does take a step towards a system that is based on meaning. Neural Machine Translation emphasizes nuance, word order, and taking into account the flow of the text when it calculates how it can deploy what information in which spaces. For example, as a result of using deep learning, it can recognize that two synonyms carry similar information and are used in similar places. It can determine that because of this they have similar meanings, and can then use them in translation. The more texts it translates, the easier the improvement of accuracy becomes, the more the end product seems natural and more based on meaning than traditional machine translation. It's in this way that deep learning has improved the accuracy of machine translation. Another contributing factor is the uptake of Neural Machine Translation by companies as a replacement for previous machine translation engines. They support deep learning as they can see it still has further room to improve. Past techniques like SMT have already been studied for around two decades, and companies evaluate that SMT's improvements in accuracy are stagnating. It's not in a state where dramatic growth can be expected in the near future. Neural Machine Translation is already surpassing the accuracy of SMT, despite the fact it was only studied for a mere two years from the publishing of the academic paper to its practical use. And yet there's still room for improvement. It's the assessment of this kind of future potential that has caused companies to simultaneously change their course for NMT.

Machine translation simply doesn't get nuance or non-verbal communication

So, how much more can machine translation accuracy be improved? Researchers seem to have the same opinion: there will soon be no need for non-English speakers to have a junior high school level English. It's true that the translation accuracy in text types with fixed rules, such as essays, patent and legal documents has seen considerable improvements. The translation accuracy improves further when a document is written with machine translation in mind, with the writer switching out literary words and phrases that have vague nuances. So will there come a time when translators are no longer needed? The commonly held opinion amongst researchers is no.

There are several reasons to think so. Where everyone agrees machine translation is of no use is novels and screenplays. While deep learning deals the general definitions of words, it cannot interpret and translate the meaning of texts as a whole. Even the meaning of the sentence, 'You are an idiot,' can change massively depending on the context. It's difficult for today's deep learning to interpret that nuance and translate the overall meaning of the text.

There are also problems unique to machine learning. For example, machine translations can unexpectedly refer to a woman in a sentence as 'he.' It might be because her profession is dominated by men, and so the rule mistakenly learned by the machine from samples is, 'with this job, use a male pronoun.' The results of learning aren't always right, and of course then it falls to humans to identify and correct these errors. Elimination of faulty learning will become a major issue with the increased use of deep learning.

There is also another fundamental problem. The accuracy of current machine translation is still very poor with simultaneous interpretation. There are two barriers to overcome, as spoken words have to be converted into written words, then translated. Machine translation is undeniably at a disadvantage. Despite that, the creation of speech recognition software has resulted in dramatic improvements in voice-to-text conversion accuracy. The problem comes from translating from the text. Compared to written language, spoken language can be all over the shop. Words can be missed out, the word order can be mixed up, there can be hesitations, or phrases that when you think over them again don't make sense. Humans can get past these factors in conversation and understand the gist. The accuracy of machine simultaneous interpretation won't improve without AI that can do the same. The possibility for mistakes to arise differs massively depending on whether it is based on a written text, spoken conversation or actual face to face conversation, even when the utterance is the same. Humans use plenty of non-verbal communication, and simultaneous interpreters particularly include these nuances in their translations. In the English-speaking world, for example, if a person says a word whilst making a gesture with their index fingers, middle fingers and thumbs held out, they are emphasizing that that word is in quotation marks. Translating this as “the so-called...” is not possible for machine translation when it relies entirely on textual information.

A sublime voice acting performance goes to waste

One phrase can be said with various different nuances. Our example sentence, 'You are an idiot,' can be said as a straight-up insult or with affection. With film dubbing for example, a script is written after having understood the differences between these kind of nuances. Put through machine translation though, the craft in a voice actor's performance would be rendered into flat, uninspired text. Language is rooted in it the culture of its country. Even if the words and conversations are the same, depending on the country, the nuance conveyed may be very different. At the moment, machine translation is very prone to mistranslating those nuances. This is because translation engines don't take into account the cultural background of each country. These factors are actually a source of misinterpretation for human translators, so it's no different for machines. As the accuracy of translation has improved, it has again become apparent how humans use a wide variety of information and use language in various different ways. At the present time, machine translation specializes in translating text where the meaning is on the surface. This is exactly why machines are bad at translation that needs detailed nuance. The problems listed here may eventually also be solved by advances in technology - but for the moment at least, there are still jobs beyond machine translation. Machine translation can be used, but with realistic expectations, either casually, forgiving a few mistakes here and there, or for use as a rough translation in a work setting. Okay, so it can only be used in a limited number of situations, but even so, the cost efficiency of machine translation is incredibly high compared to human translators, and there's no doubt it aids communication. In a Japan that is preparing for the Tokyo Olympics and welcoming in increased visitors from overseas, there is considerable merit in being able to grasp even the gist of a conversation. Machine translation is ultimately a tool, and whether tools are convenient or not is decided by the people who use them. If at long last an AI can understand our cultures and language nuances, it would be a fascinating digital assistant, but we haven't quite created it yet.

D Baker - 日英翻訳者

------------------------- 今までの翻訳をご覧いただきありがとうございます。和英翻訳は私にお任せください!ご連絡をお待ちしております。------------------------- Translation: debra_baker@hotmail.co.uk Tutoring: @grammargopher