Reality 2.0: The Convergence of Tech and Imagination

Reality 2.0: The Convergence of Tech and Imagination
Terminal snapshot from executing Japanese Text Transcription and Translation to English.

Two years ago, a friend of mine asked: "How is it possible, you of all people, not to read science fiction anymore?". It was a reasonable question given that part of my library contains about two hundred science fiction books and I am the author of a science fiction book titled "Silicon Virus" by Selir Kyzpem (my pen name). The short answer to that is that now, we live science fiction!

SciFi by its nature is dystopian. Technological advances have two faces: they solve exisitng problems facilitating processes but also place threats. SciFi acts as an early warning for these threats. Of course SciFi is also entertaining. However in these last decades, the trend is to create more and more SciFi stories for consumption. As always, more is less!

However, the most important factor are the technological advances we live these days move with an unprecedented pace. If one has the ability to read what is being developed, work on it, and make it work on its laptop what seemed SciFi two years ago, then this might be better than reading a SciFi book of ambiguous quality!

Here is a tangible example. When learning a new language, Japanese in my case (Yes, I am still doing that!), the learner must address:

  • Reading Comprehension
  • Speaking
  • Listening Comprehension
  • Writing

So, tofugu.com (mosre specifically www.wanikani.com) covered the need of reading comprehension. It also provides pronounciation of the different words and phrases by a male and female voice. However, verification that one may properly pronounce what was heard, is a different story. One usually had to physically (or virtually) attend a course, with a teacher to address that need. To have a conversation with a native speaker of a foreign language, one cannot be an absolute beginner (as I am in Japanese)!

Furthermore, suppose that you remember a spoken word in the foreign language but cannot recall what it meant! How are you going to check that?

In our AI era, the solution is relatively simple. What is needed is the ability to:

  • Record the speech (old tech)
  • Feed the recorded speech to software that will reliably convert to text: whisper, is an open source software from Open AI that is able to do that efficiently for over than 100 languages (although reliability is language dependent). Having build the software (whisper-cpp) from sources, to mitigate security concerns, now I am able to see my spoken Japanese transcribed to Hiragana, Katakana and Kanji! A nice add-on that they provide is that the output text is coloured green if the syllabe is well recognized or red if there is ambiguity! So, one may experiment with the pronounciation to reduce ambiguity!
  • Finally, the transcribed text may be fed to a locally running LLM through an API (Application Programming Interface) to ask for its translation! For japanese, one of the models that excels in Japanese to English translation is Alibaba's Qwen2.5 with 72 billion parameters LLM (specifically the qwen2.5:72b-instruct-q4_K_M 47GB model that runs seamlessly on my 64GB RAM M4 Max Macbook Pro).

At the top of the post, you may see running on Terminal the (zsh) script I created that orchestrates the interaction with the prementioned software. So now, when I learn a new word, I am able to check if I pronounce it correctly. Well, if software understands what I am saying, a human being will also!