NLP Part IV: Text To Speech API creation using FAST API
If you are looking for creating a text-to-speech project in minutes, you are at the right place. There are many ways to implement it e.g. using a python library, Deep learning models, etc. In this blog, I will be using Google Text-to-speech (https://pypi.org/project/gTTS/) and FAST API to quickly develop an API.
First of all, install the required packages below.
# pip install requirements.txtfastapi
uvicorn
fastapi
Lets start with the model.py by defining objects in pydantic is via models. Pydantic is parsing library, which parses the input json and helps to store the required data for further processing.
from pydantic import BaseModel
class InputData(BaseModel):
data: str
Then create main.py. This will contain the endpoint function where we can call the text-to-speech function.
from fastapi import FastAPI
from fastapi.responses import FileResponse
import uvicorn
from starlette.responses import JSONResponse
from starlette.middleware.cors import CORSMiddleware
from model import InputData
from text2Speech import text2Speech
import sys
version = f"{sys.version_info.major}.{sys.version_info.minor}"
app = FastAPI()
@app.get("/")
async def read_root():
message = f"Hello world! From FastAPI running on Uvicorn with Gunicorn. Using Python {version}"
return {"message": message}
@app.post("/inputText/")
async def text_to_speech_endpoint(input_data: InputData):
return await text2Speech(input_data)
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
FASTAPI is startlette underneath. Starlette is a lightweight Asynchronous Server Gateway Interface framework, which is ideal for building async web services in Python. Asynchronous code means that to wait at some point in the code, for finishing some task somewhere else while the code execution is happening. This can using async and await key words, async function should be used with await key while function is being called.
Then create the text2Speech.py
from gtts import gTTS
from model import InputData
import base64
from fastapi.responses import FileResponse
async def text2Speech(input_data: InputData,responses={200:{"description":"text to speech","content": {"audio/mpeg":{"example":"no audio available"}}}}):
inputText = input_data.data
tts = gTTS(text=inputText,lang='en',slow=False)
tts.save("output.mp3")
with open("output.mp3","rb") as file:
myObj = base64.b64encode(file.read())
return {"speech": FileResponse(path="E:\Fastapi\t2s\output.mp3",media_type="audio/mpeg",
filename='speech.mp3'),
"data":inputText
}
Here the InputData is the model object that contains input json object. content type is audio/mpeg. In the gTTS() , you need to pass the text , the language of the speech you want convert the text. Then save the it using output.mp3. Make sure to encode the audio to base64 object. This object you can send over api response. Here I have leveraged the FileResponse() to send the file over api response. Technically you are sending the filepath of audio file. This is the json response of the api.
{
"speech": {
"path": "E:\\Fastapi\t2s\\output.mp3",
"status_code": 200,
"filename": "speech.mp3",
"send_header_only": false,
"media_type": "audio/mpeg",
"background": null,
"raw_headers": [
[
"content-type",
"audio/mpeg"
],
[
"content-disposition",
"attachment; filename=\"speech.mp3\""
]
],
"_headers": {
"content-type": "audio/mpeg",
"content-disposition": "attachment; filename=\"speech.mp3\""
},
"stat_result": null
},
"data": "Welcome to text to speech project"
}
Congratulations you have created an API successfully. You can also access the code here.
Hope you understand and please write down your thoughts in the comment section. I will be back with more interesting topics for the NLP series. Follow me up at Medium or Subscribe to my blogs to be informed about them. As always, I welcome feedback and constructive criticism and can be reached on Twitter @RoutraySomesh & Gmail