FluentAI Chatbot for Mandarin Learners

Github: https://github.com/xfactor-toml/fluentai_mandarin

1. Summary

The project aims to provide an interactive platform where users can practice and improve their Mandarin vocal skills through both voice-based and text-based communications. Leveraging advanced technologies such as speech recognition, AI-driven chatbots, and speech synthesis, the application offers a comprehensive solution for Mandarin language learners. Key features of the project include Voice-to-Text Conversion, Chatbot Integration, Speech Output, and Pronunciation Evaluation, which combined, deliver a robust learning tool for users.

2. Responsibility

I was responsible for designing, developing, and integrating various components of this project.

Key tasks included:

Requirements Analysis: Understanding user requirements and translating them into technical specifications.
Design & Architecture: Creating a scalable architecture that connects various APIs and ensures smooth data flow between components.
API Integration: Implementing API calls to Deepgram for Voice-to-Text conversion and ElevenLabs for Text-to-Speech conversion. Integrating OpenAI's chatbot API to enable interactive conversations.
Frontend Development: Crafting a user-friendly interface for both text-based and voice-based interactions, ensuring seamless user experience.
Backend Development: Building a robust backend to handle data processing, interactions between APIs, and user data management.
Testing & QA: Conducting rigorous testing to ensure each feature functions as intended and that the system overall is reliable and efficient.
Deployment: Overseeing the deployment process, ensuring all components are functioning smoothly in the live environment.

3. Technical Overview

Voice-to-Text Conversion:
- Technology Used: Deepgram API
- Functionality: Captures voice input from the user and converts it to text with high accuracy, enabling real-time language practice.
Chatbot Integration:
- Technology Used: OpenAI's Chatbot API
- Functionality: Engages users in conversational practice, providing human-like responses to text-based or voice-based queries.
Speech Output:
- Technology Used: ElevenLabs API
- Functionality: Converts the chatbot's text responses back into speech, allowing users to hear and practice correct pronunciation.
Pronunciation Evaluation:
- Functionality: The system provides feedback on the user's pronunciation to help them understand areas for improvement. This feature aims to assist users in accurately mastering Mandarin tones and phonetics.
System Architecture:
- Frontend: Developed using modern web technologies such as HTML, CSS, and JavaScript to ensure a responsive and interactive user experience.
- Backend: Built using Node.js to handle server-side logic, API integrations, and data processing.
- Database: Utilized MongoDB for efficient data storage and retrieval.

4. Challenge

Accuracy in Voice-to-Text Conversion:
- Challenge: Achieving high accuracy in converting user’s spoken Mandarin to text, especially given the tonal nature of the language.
- Solution: Extensively testing and fine-tuning the Deepgram API settings to optimize performance for Mandarin input.
Natural Conversational Flow:
- Challenge: Ensuring that the chatbot can handle various conversational scenarios in a natural and intuitive manner.
- Solution: Leveraging OpenAI’s advanced language models, coupled with custom training and fine-tuning, to handle diverse user inputs and provide coherent responses.
Latency in Speech Output:
- Challenge: Minimizing the delay in converting text responses back to speech, which is crucial for maintaining a natural conversation pace.
- Solution: Implementing asynchronous processing and optimizing API call efficiency to reduce latency.
Pronunciation Feedback Mechanism:
- Challenge: Developing an accurate system for evaluating and providing meaningful feedback on user pronunciation.
- Solution: Integrating advanced phonetic analysis tools and incorporating user feedback to continually refine the evaluation algorithms.
User Interface Design:
- Challenge: Creating an intuitive and welcoming user interface that accommodates both novice and advanced learners.
- Solution: Conducting user research and usability testing to design a frontend that meets user needs while maintaining engagement and simplicity.