How to use google speech recognition api in python? -


stack overflow might not best place ask question need help. have mp3 file , want use google's speech recognition text out of file. ideas can find documentation or examples appreciated.

take @ google cloud speech api enables developers convert audio text [...] api recognizes on 80 languages , variants [...] can create free account limited amount of api request.

how to:

you need first install gcloud python module & google-api-python-client module with:

pip install --upgrade gcloud pip install --upgrade google-api-python-client 

then in cloud platform console, go projects page , select or create new project. after need enable billing project, enable cloud speech api.

after enabling google cloud speech api, click go credentials button set cloud speech api credentials

see set service account information on how authorize cloud speech api service code

you should obtain both service account key file (in json) , google_application_credentials environment variable allow authenticate speech api

once done, download audio raw file google , speech-discovery_google_rest_v1.json google

modify previous downloaded json file set credentials key make sure have set google_application_credentials environment variable full path of .json file with:

export google_application_credentials=/path/to/service_account_file.json 

also

make sure have set gcloud_project environment variable id of google cloud project :

export gcloud_project=your-project-id 

assuming done, can create tutorial.py file contain:

import argparse import base64 import json  googleapiclient import discovery import httplib2 oauth2client.client import googlecredentials   discovery_url = ('https://{api}.googleapis.com/$discovery/rest?'                  'version={apiversion}')   def get_speech_service():     credentials = googlecredentials.get_application_default().create_scoped(         ['https://www.googleapis.com/auth/cloud-platform'])     http = httplib2.http()     credentials.authorize(http)      return discovery.build(         'speech', 'v1beta1', http=http, discoveryserviceurl=discovery_url)   def main(speech_file):     """transcribe given audio file.      args:         speech_file: name of audio file.     """     open(speech_file, 'rb') speech:         speech_content = base64.b64encode(speech.read())      service = get_speech_service()     service_request = service.speech().syncrecognize(         body={             'config': {                 'encoding': 'linear16',  # raw 16-bit signed le samples                 'samplerate': 16000,  # 16 khz                 'languagecode': 'en-us',  # bcp-47 language tag             },             'audio': {                 'content': speech_content.decode('utf-8')                 }             })     response = service_request.execute()     print(json.dumps(response))  if __name__ == '__main__':     parser = argparse.argumentparser()     parser.add_argument(         'speech_file', help='full path of audio file recognized')     args = parser.parse_args()     main(args.speech_file) 

then run:

python tutorial.py audio.raw 

Comments