Job Oportunity | MACHINE LEARNING ENGINEER (VOICE CLONING AND SPEEC at Factored

Machine Learning Engineer (Voice Cloning and Speech Synthesis)

Latin America

Engineering /

Full-time /

Remote

Who We Are:

Factored was conceived in Palo Alto, California by Andrew Ng and a team of highly experienced AI researchers, educators, and engineers to help address the significant shortage of qualified AI & Machine-Learning engineers globally. We know that exceptional technical aptitude, intelligence, communication skills, and passion are equally distributed around the world, and we are very committed to testing, vetting, and nurturing the most talented engineers for our program and on behalf of our clients.

We are looking for an experienced Machine Learning Engineer with a strong background in text-to-speech (TTS) models and voice cloning technologies. In this role, you will develop and optimize ML models aimed at improving user experience by enabling voice actors to generate content in multiple languages and boost their productivity with voice cloning. You will collaborate closely with our diverse client base to ensure the solution is scalable, reliable, and operates in real time.

What You Will Be Doing:

Design, develop, and optimize text-to-speech models with a focus on maintaining the style and authenticity of the original voice actors.
Implement real-time, scalable voice cloning systems capable of interacting with clients in under 1-second inference time.
Collaborate with teams to work on audio datasets that include voice recordings and multilingual transcriptions.
Experiment with models like StyleDiffusion and explore other cutting-edge approaches for human-like, realistic speech synthesis.
Ensure performance reliability across millions of users by scaling up systems to handle high-demand scenarios.
Work with audio data preparation, including splitting, up/downsampling, and file management using tools like Whisper.
Integrate your models into a cloud environment (e.g., AWS) for seamless deployment and monitoring.

What You Will Bring:

Strong proficiency in Python and experience with machine learning frameworks such as TensorFlow or PyTorch.
Proven expertise in speech synthesis models and text-to-speech technologies, with a focus on realistic, human-like outputs.
Experience with voice cloning and familiarity with models like StyleDiffusion or similar.
Ability to deliver real-time solutions with high-performance reliability in production environments.
Experience working with audio datasets, including data preprocessing, splitting, upsampling/downsampling, and file management.
Familiarity with multilingual models and working with transcriptions in multiple languages.
Proficient in cloud platforms like AWS and deploying machine learning models in a production environment.
Experience with Whisper or similar tools for handling audio datasets.
Knowledge of traditional ML techniques, including XGBoost or gradient boosting for model optimization.

At Factored, we believe that passionate, smart people expect honesty and transparency, as well as the freedom to do the best work of their lives while learning and growing as much as possible. Great people enjoy working with other passionate, smart people, so we believe in hiring right, and are very selective about who joins our team. Once we hire you, we will invest in you and support your career and professional growth in many meaningful ways. We hire people who are supremely intelligent and talented, but we recognize that intelligence is not enough. Perhaps more importantly, we look for those who are also passionate about our mission and are honest, diligent, collaborative, kind to others, and fun to be around. Life is too short to work with people who don’t inspire you.

We are a transparent workplace, where EVERYBODY has a voice in building OUR company, and where learning and growth is available to everyone based on their merits, not just on stamps on their resume. As impressive as some of the stamps on our resumes are, we recognize that human talent and passion exist everywhere, and come from many backgrounds, so stamps matter much less than results. All of us are dedicated doers and are highly energetic, focusing vehemently on execution because we know that the best learning happens by doing. We recognize that we are creating OUR COMPANY TOGETHER, which is not only a high-performing fast-growing business, but is changing the way the world perceives the quality of technical talent in Latin America. We are fueled by the great positive impact we are making in the places where we do business, and are committed to accelerating careers and investing in hundreds (and hopefully thousands) of highly talented data science engineers and data analysts.

In short, our business is about people, so we hire the best people and invest as much as possible in making them fall in love with their work, their learning, and their mission. When not nerding out on data science, we love to make music together, play sports, play games, dance salsa, cook delicious food, brew the best coffee, throw the best parties, and generally have a great time with each other.

Job Details

MACHINE LEARNING ENGINEER (VOICE CLONING AND SPEEC

Factored

Machine Learning Engineer (Voice Cloning and Speech Synthesis)

What You Will Be Doing:

What You Will Bring:

Other jobs from this company