Job Openings

Research Internship: Self-Improving Data Processing Engine for State-of-the-Art Generative AI Speech Models

Full-time | Voice & Conversational AI | Global Enterprise AI Platform

Duration: 4-8 Months

Location: Switzerland (Europe), on-site at AGIGO’s Zurich Office

About AGIGO

AGIGO™ is the first enterprise-grade conversational AI platform that empowers enterprises to transform customer engagement and business performance with high-agency AI-agents - agents that match well-trained human customer agents in naturalness, responsiveness, and autonomous task resolution. Built for on-premises or hybrid deployment, with no reliance on third-party services, our proprietary platform gives enterprises full control, observability, and data sovereignty. Its unified core, tunable base models, and end-to-end design toolchain deliver context-aware, adaptable agents that engage directly with customers in real-time. Founded February 2025 in Switzerland by a team of 18 experienced AI pioneers, AGIGO is driven by a bold vision to lead the next major wave in AI by transforming how businesses interact with their customers.

Your Research Mission

The single biggest bottleneck in building next-generation AI models is not the models themselves, it is the data. In this research internship, you will design and build a self-improving data processing engine for large-scale ASR and TTS model training. You will not just clean data: you will develop a next-generation data engine that will directly power AGIGO production models. Your primary focus will be on solving the modern challenge of detecting and purging low-quality and synthetic data from massive, un-trusted web-scale corpora (for example from archive.org).

Phase 1: Establish State-of-the-Art Baseline

You will build a highly scalable and parallelized data processing pipeline inspired by industry best practices (such as e.g. NVIDIA's NeMo Granary). This will serve as the robust foundation for more advanced research. This phase involves implementing and optimizing a multistage workflow:

Audio Canonicalization: Ingesting raw audio in any format and standardizing it (e.g., resampling if needed, mono-channel, codec normalization, etc.).

Initial Transcription & Alignment: Employing a multi-pass approach using models like FasterWhisper for initial transcription, language identification (LID) and generating rough time-stamps.

Segmentation & Grooming: Implementing robust algorithms to slice long audio segments into clean sentence-like utterances, intelligently handling speaker turns and non-speech events.

Text Restoration & Normalization: Using powerful LLMs (e.g., Llama3) to restore punctuation and capitalization, followed by text normalization to handle numbers, acronyms and symbols.

Heuristic Filtering: Implementing a baseline set of filters for removing data based on duration, word count, character set, word-per-second ratios, repeat n-grams, perplexity, audio-text embedding scores, etc. In total, we have identified more than 50 different steps that can be performed to obtain metadata from audio files for filtering stages.

Phase 2: Establish State-of-the-Art Baseline

This is where you will move beyond the baseline and introduce novel research to tackle the most difficult data quality challenges. The goal is to replace simple heuristics with intelligent, model-based scoring functions.

Cross-Modal Coherency Scoring: A key innovation will be to assess if the audio and text are a good match. You will research and build a model that scores the cross-modal coherency, flagging inconsistencies like positive audio paired with text describing a negative event, which is a strong indicator of a mismatched or synthetic pair. This is a more advanced line of work and we will work on it if time allows it.

Prosody and Speaker Attribute Modeling: For TTS, flat and monotone audio could decrease the overall performance or perception of TTS quality. You will build a model to score the prosodic richness of audio clips, allowing us to select for more expressive training data or at least, use this data in different stages of training, e.g., SFT or RLHF.

Key Research Challenges

Intelligent Data Selection vs. Filtering: Instead of simply filtering out "bad" data, can we frame this as a data selection problem? You will explore techniques such as core set selection or active learning to intelligently select the most valuable subset of data for training a model, prioritizing samples that are high-quality, diverse, and informative. These data can be the core of post-training strategies, such as RLHF or SFT for autoregressive TTS training.

The Self-Improving Pipeline: Your ultimate goal is to create a feedback loop. Can the models within the pipeline (e.g., the synthetic speech detector, the quality estimators) be periodically retrained on newly flagged and verified data? This would create a self-correcting system that becomes more accurate and robust over time.

Uncertainty-Aware Processing: The pipeline should not just make binary keep/drop decisions. You will design it to output confidence scores for each quality metric, allowing us to automatically decrease/increase the confidence bar.

Your Impact

Your work will directly influence the performance of AGIGO’s future ASR and TTS production models. The core engine you build will be used for pretraining and fine-tuning our production speech models and contribute to the continual learning across AGIGO’s voice technology stack. You will see your research integrated into real-world systems and your code and models directly improve the quality of our speech recognizers and voice synthesis.

We are furthermore fully open to discuss to potentially enhance, modify, or expand the project scope based on your research insights, interests and expertise.

What You Bring

Required

Master student (preferred) or PhD student in Computer Science, Machine Learning, or a related field
Strong Python programming skills and Git
Solid understanding of ML fundamentals and MLOps
Hands-on experience with PyTorch
Fluent in English, highly motivated, willingness to learn

Plus Points

Experience with Hugging Face models (for LLMs, ASR, or "speech-LLMs")
Hands-on experience with large-scale data processing pipelines
Hands-on experience with audio AI (ASR/TTS) model training and development

What You Will Gain

Direct product impact: your research and code used in AGIGO’s production platform
Mentorship: work closely with our expert team of researchers and engineers
Top-tier AI infrastructure: access to GPU clusters with NVIDIA Hopper (H200) and Blackwell RTX GPUs
Research visibility: we will actively support you in publishing your work at a top-tier conference or in a journal paper
Disciplined and inspiring research environment: a team of sharp minds grounded in expertise, autonomy, and a shared pursuit of impactful breakthroughs
Paid internship: market-level salary, flexible hours, and free coffee, drinks, fruits and snacks
Career path: this internship may lead to a full-time permanent role in AGIGO's world-class AI R&D team

How to Apply

To apply, please send your resume and a brief introduction to internships@agigo.ai with the subject line:

‍Research Internship – Self-Improving Data Processing Engine – [Your Full Name].

‍
By submitting your application, you agree to allow AGIGO to store and process your data for recruitment purposes. Unless otherwise requested, we may retain your data for up to one year to consider you for this or other future opportunities.

AGIGO™ is a registered trademark of AGIGO AG, Switzerland.‍

internships@agigo.ai

Research Internship: Universal Phonetizer for Next-Generation Voice AI

Research Internship: Self-Improving Data Processing Engine for State-of-the-Art Generative AI Speech Models

Research Internship: Diagnostic & Perceptual Evaluation Framework for Generative Speech

Research Internship: Multi-Axis Preference Optimization for Controllable & Expressive Text to Speech

Business Strategy & Operations Internship, Switzerland

Marketing & Communications Internship, Switzerland

Enterprise Account Executive, North America

Partner Account Executive, North America

Senior Engagement Manager, North America

Enterprise Account Executive, Asia-Pacific

Partner Account Executive, Asia-Pacific

Senior Engagement Manager, Asia-Pacific

Enterprise Account Executive, Europe

Partner Account Executive, Europe

Senior Engagement Manager, Europe

Senior Marketing Executive, Switzerland

Principal Sales Engineer, Switzerland

CX Research Engineer, Switzerland

Zurich Office

AGIGO AG
Neugasse 136
8005 Zurich
Switzerland

www.agigo.ai
info@agigo.ai
+41 44 272 90 00

Headquarters

AGIGO AG
Oberallmendstrasse 18
6300 Zug
Switzerland

VAT-number:
CHE-215.778.919 MWST

Get in Touch

Enterprise Inquiries

Industry Partners Solution Partners

Investor Relations

Job Openings

Privacy Terms

Terms of Use

‍

Welcome to the website of AGIGO AG. By accessing or using this website, you agree to the following terms. If you do not agree with these terms, please do not use our site.
‍
1. Company Information

AGIGO AG

Oberallmendstrasse 18

6300 Zug, Switzerland

Email: info@agigo.ai

2. Website Content

All content on this site is provided for general information only. We make reasonable efforts to keep information up to date, but we do not guarantee its accuracy or completeness.

3. Intellectual Property

Unless otherwise stated, all content on this website — including text, graphics, logos, and images — is the property of AGIGO AG or its licensors and is protected by applicable copyright and intellectual property laws. You may not copy, reproduce, or distribute any content without our prior written permission.

4. No Warranties

This website is provided “as is” without warranties of any kind, either express or implied. We do not guarantee that the site will be available at all times or that it will be free of errors or viruses.

5. Limitation of Liability

To the extent permitted by law, AGIGO AG is not liable for any direct or indirect damages arising from your use of this website.

6. External Links

Our website may contain links to external websites. We are not responsible for the content or privacy practices of these third-party sites.

7. Governing Law

These terms are governed by the laws of Switzerland. Any disputes arising in connection with these terms shall be subject to the exclusive jurisdiction of the courts in Zug, Switzerland.

8. Contact

If you have any questions about these terms, please contact us at info@agigo.ai.

‍

Privacy Policy

‍

We take your privacy seriously. We collect and process personal data only when necessary and in accordance with applicable data protection laws, including the EU General Data Protection Regulation (GDPR) and the Swiss Federal Act on Data Protection (FADP).

What we collect

We may collect basic personal information (such as your name, company name, and email address) if you contact us directly. We do not use cookies for tracking or analytics.
‍

How we use your data

We use personal data solely to respond to inquiries and provide our services. We do not share your data with third parties, unless required by law.
‍

Your rights

You have the right to access, correct, or delete your personal data at any time. Please contact us at info@agigo.ai to make a request.

U.S. Visitors

We respect the privacy rights of all our website visitors, including those from the United States. While we are based in Switzerland and primarily follow GDPR and Swiss privacy laws, we aim to handle all personal data in a secure and transparent way.

Contact

If you have any questions about this privacy policy, please contact us at:

Email: info@agigo.ai

Address: AGIGO AG, Oberallmendstrasse 18, 6300 Zug, Switzerland