Quick Thought: Universal translator and same language translator

Quick Thoughts are random thoughts looking for comments

Let’s imagine a universal translator able to translate any language to any language. Sourcing a corpus of pair translation is a major hurdle. However there is an almost infinite corpus of pair translations: a language with itself; translating English to English is easy, even for a computer.

Let’s give the blackbox universal translator three inputs: a source text, the language of the source text, the language of the desired translation. What would be the consequences for the learning system inside the blackbox of being constrained that if the languages are the same, the output has to be identical to the input?

Obviously, the blackbox could quickly learn that bypassing the translation does the trick. However, that would probably require the internal circuitry to allow for the bypass, and that could be constrained out. So:

  • Could we expect any interesting result?
  • Could the input to be eventually forced down to a language-independent universal representation?
  • Let’s say there is a language-independent universal representation kernel. If the input comes in without information of which is the output language, and the output has no information of what the input language was, does it force the network to create a universal representation, or would it just withered away?
  • Is it possible to invert a network? Probably not in a truly bijective way, but to model the fact that text representation \(\rightarrow\) universal representation is the inverse (for some definition of the word) of universal representation \(\rightarrow\) text representation of the same language?

Comments welcome.

Emmanuel Rialland
Emmanuel Rialland
Consultant Finance - Machine Learning
comments powered by Disqus

Related