Usage Guidelines for ACE Custom Voice Model

Thank you for your interest in ACE Custom Voice Model!

Please make sure to read this document before using the service, to ensure that you have a clear understanding of its features, limitations, and usage guidelines.

Features and Limitations

ACE Custom Voice Model, also known as "Custom Singer", is an enhanced service provided by Timedomain within ACE Studio. It allows you to upload voice talent data, train a singing voice model, deploy the model, and use it as an AI singer within ACE Studio software. In simple terms, through this service, you can create your own AI singer from scratch and have exclusive rights to use that AI singer.

The key terms associated with this service are as follows:

Voice Talent: An individual or target singer whose voice is recorded and used to create a singing voice model, enabling the model to possess their acoustic characteristics and generate synthesized singing results similar to their voice through singing voice synthesis (SVS).
Voice Talent Data: The dry Acapella recordings of voice talent, used to train the singing voice model to learn and mimic the voice timbre and singing style of the voice talent.
Singing Voice Model: A computer model that can simulate the unique vocal characteristics of a target singer and convert MIDI and lyrics into singing voices. A singing voice model is a set of binary-format parameters that is not human readable and does not contain audio recordings. It cannot be reverse engineered to derive or construct the original recordings of the voice talent.
Model Training: The process of optimizing the singing voice model using algorithms and voice talent data, enabling the model to learn patterns and rules from the input data and make accurate predictions or generate outputs on unseen data based on that learning.
Model Deployment: The process of applying the trained model to a real production environment.

Features

The features of ACE Custom Voice Model include:

Automated Training

Automatically annotate the voice talent data you upload to generate the annotation files required for training the singing voice model.
Train your exclusive singing voice model using the generated annotation files and the latest version of ACE SVS base model.

Model Evaluation and One-Click Deployment

After automated training is completed, we will provide you with one or more optional results based on the training platform you have selected.
You can adjust the singing style of the training results using Style Transfer technology and try out the training results in ACE Studio. (only available on the Pro version platform)
You can then select one of the results as the AI singer for deployment.

Usage in ACE Studio

You have exclusive rights to use your custom singer, which means others cannot access or use your custom singer without your permission.

You can purchase collab seats to invite your partners to use your custom singer in ACE Studio.

Your custom singer has the following features and usage methods, just like the public AI singers in ACE Studio:
- Language Transfer: Perform singing in languages different from voice talent data. Simply set the note language in ACE Studio and input lyrics to sing content in the specified language.
- Editable AI Parameters: Intervene in the synthesis of singing voices by manually adjusting the pitch and AI emotional parameters generated by SVS in ACE Studio software, allowing for the creation of synthesized vocals with unique expressiveness.
- VoiceMix (available only on the Pro version platform): You can blend the voice characteristics of your custom singer with other AI singers, creating new vocal attributes.
Please note that whether you have the rights to use the custom singer or not, you need to maintain ACE Studio membership to use ACE Studio software.

Limitations

Please understand that model training is a relatively opaque process with many variables, which can result in unforeseen training outcomes.

To ensure smooth model training, please make sure that your voice talent data meet the following requirements:

Use a consistent voice timbre during recording and avoid mixing data from multiple voice talents.
Ensure that each audio contains vocals performed in a single language (currently supporting Chinese, Japanese, and English only).
Each audio should have a duration of at least 10 seconds and not exceed 10 minutes.
The audio should not contain any content other than vocals, such as accompaniment or background noise.
Upload the audio in wav, flac, mp3, or m4a formats.
A minimum of 1 audio files is required, and it is recommended to have between 30 and 100 files.
Each audio file should have a blank space of at least 2 seconds at the beginning and end.

To achieve optimal results in model training, it is recommended that your voice talent data meet the following additional requirements:

Use the wav audio format with a quality of 44100Hz/16Bit or higher.
Ensure each recording is a dry Acapella free from noise and without overlapping audio.
During the recording, try to perform as expressively as possible and cover a wide range of pitches.
You may apply appropriate compression, EQ, and sound editing techniques, but avoid adding effects like reverb or delay to the dry Acapella.
Avoid significant volume differences within each audio file and apply uniform volume normalization to all recordings.

Although your voice talent data meet the aforementioned requirements, you may still encounter the following situations:

Unexpected termination of model training.
Failure of some or all of the alternative AI singers generated as training results to meet your expectations in terms of singing performance.
Occasional imperfections such as pitch inaccuracies, vocal breaks, or voice hoarseness in the performance of the custom AI singer.
Less accurate pronunciation of custom AI singer in language-transferred performance compared with the native language performance.

By confirming the model deployment, you signify that you understand and accept the aforementioned limitations. Please carefully consider and make a rational decision before proceeding with the payment.

Responsible Use

SVS is an emerging technology that is rapidly advancing. On one hand, this technology has tremendous potential to help music creators break the limitations of vocal abilities and unleash greater musical imagination. On the other hand, it allows people to manipulate a singing voice model to produce new content that emulates specific human vocal traits, and the misuse of it can potentially cause harm.

When using the ACE Custom Voice Model to create your own AI singer, it is important to be aware that as the creator of the custom singer project, you have a responsibility to ensure that your custom singer project complies with laws, regulations, and existing social norms, and to avoid any misuse of your custom singer. While we recognize that there is currently no perfect method to prevent the misuse of SVS, we still aim to maximize the responsible use of ACE Custom Voice Model through the following code of conduct:

Legality of Dataset: You need to ensure that the voice talent data used to create the custom singer is obtained legally. If there are any concerns raised regarding the legality of the voice talent data you have used, we may request additional information from you through the contact details provided during the real-name registration process. This may include, but is not limited to, explicit written permission, and/or recording of verbal informed consent from the voice talent regarding the use of their voice. If you are unable to provide relevant documentation to prove the legality of the voice talent data, we reserve the right to unilaterally terminate the provision of Custom Voice Models to you, and any fees paid will not be refunded. Therefore, we recommend the following when creating a voice talent dataset:
- Sign a proper authorization agreement with the voice talent.
- Send the "Disclosure for Voice Talent" document to voice talent to help them understand how their voice will be used for SVS, so they can evaluate any potential risks and make a more informed decision about providing their voice.
- Request the voice talent to verbally state and record the following: "I have read the 'Disclosure for Voice Talent' document from ACE Studio, understand the implications of my voice being used for SVS, and consent to the use of my voice for the ACE Custom Voice Model project."
  
  Prohibited Content: Your custom singer cannot contain or be used for the following purposes. Failure to comply may result in us requiring you to remove any negative impact, unilaterally terminating the provision of Custom Voice Models to you, or terminating any ACE Studio services to you:
- Simulating the voice of politicians or government officials, even with their consent.
- Deceiving or intentionally misleading others.
- Creating, inciting, or disguising hate speech, discrimination, defamation, terrorism, or acts of violence.
- Engaging in other activities that violate laws, regulations, and existing social norms.
- Any other conduct which, in ACE Studio Team's reasonable judgment, is considered improper.

The public's understanding and perception of SVS technology is an evolving process that changes over time. In the future, we may periodically update or upgrade measures to ensure that the use of SVS aligns with positive, reasonable, and public expectations. If you have any related questions, please contact us at support@acestudio.ai.