Python Code to convert Speech to Text.
What is the pyttsx3 module?
pyttsx3 is a python module that is used to listen, speak i.e its basically a Text-to-Speech offline module. It is compatible with Python 2 & Python 3. Install it using terminal/command prompt,
pip install pyttsx3
How to convert Speech Into Text?
We take input of voice from Microphone and then convert it to text using Google/Bing/Yahoo API to convert Audio data in text (required a net connection). We also import the speech_recognition module to get the Audio Data then initialize the pyttsx3 module and set the voices from Windows API i.e sapi5
r = sr.Recognizer()
with sr.Microphone() as source:
print("Listening...")
audio = r.listen(source)
try:
print("Recognizing...")
query=r.recognize_google(audio)
#syntax for google API ↓
#recognize_google(audio, language='en-in')
#using Google API to convert audio into text
print(query)
engine.say(query)
except Exception as e:
print(e)
return query
The listen function collects the data from our Microphone and save as AudioData in a variable, where the variable we used is 'audio'. Then we send the audio data to Google API, and also pass language that is given as AudioData, we also pass a Google API key if you have any if not then a default generic key is using for Personal or Testing purpose. When we get the text conversion of AudioData from Google API we print the text We place the code inside the try block because it might create some exception if it
can't understand what you say in Microphone. The syntax for google API we used:
rocognize_google(audioData, language='', key=None)
- audioData = Voice collected from Microphone
- language = [String] default en-US, for india use en-in
- key = Google Key is any available. default None
We can also change the threshold, background noise cancelation, and much more value by overriding to default values, to check the default values ctrl+click on Recognizer() i.e is a python class to check the values.
self.energy_threshold = 300
self.dynamic_energy_threshold = True
self.dynamic_energy_ratio = 1.5
self.pause_threshold = 0.8
self.operation_timeout = None
self.phrase_threshold = 0.3
self.non_speaking_duration = 0.5
This is the default values that can be changed by overriding them, change it according to Microphone sensitivity.
What can we make using this?
In my upcoming blogs, I will make some micro-projects related to this module with proper use. But till then you can make an Audio-Text-Convertor or Book Reader(Just like Amazon Kindle) that will read a book for you or Language Convertor (change the English language to some other language).
g8 info
ReplyDelete