🗃️Available Training Datasets

We currently support three kinds of datasets :

  • Web 3.0 & AI Training Data, which involves acquiring high-quality content related to web3 and AI from social platforms like X and assessing whether this content is valuable for fine-tuning large-scale models in specialized fields. Both content providers and validators can profit from this process.

  • RHLF Training Data, Reinforcement Learning from Human Feedback, integrates human judgments to refine AI behaviors. PublicAI's data consensus algorithm, rooted in cross-validation and Byzantine Fault Tolerance, offers a robust mechanism for ensuring the integrity and relevance of human feedback, thereby significantly improving the efficiency and reliability of RLHF processes in training AI models.

  • TTS(Text-to-speech) Data refers to datasets used for training Text-To-Speech systems, containing text, corresponding audio recordings, and phonetic and prosodic annotations. High-quality TTS data is crucial for developing natural and fluent speech synthesis systems, with its scale and diversity directly impacting model performance.

  • Aesthetics Assessment Training Data involves evaluating the beauty and artistic value of images, videos, and music to train the machine learning models. Human consensus on aesthetics is crucial for training these models to understand diverse perspectives on beauty. PublicAI's data consensus algorithm, emphasizing cross-validation and Byzantine Fault Tolerance, is ideal for gathering and validating a wide range of human judgments, ensuring AI models are trained on accurate and comprehensive assessments of aesthetics.

Last updated