Loading organizations...

§ Private Profile · San Francisco, CA, USA
Data generation company providing expert-quality coding data at scale for LLM fine-tuning and evaluation for AI companies.
Datacurve has raised $18.5M across 3 funding rounds.
Key people at Datacurve.
Datacurve was founded in 2024 by Charley Lee (Founder) and Serena Ge (Founder).
Datacurve has raised $18.5M in total across 3 funding rounds.
Based in San Francisco, California, Datacurve is a data infrastructure company that generates expert-quality coding data at scale for fine-tuning and evaluating large language models. The company utilizes a gamified bounty system to attract skilled software engineers who complete technical challenges to produce high-value datasets for artificial intelligence training. Operating with a core team of four employees, the business reached $1 million in annual recurring revenue within its first six months and has distributed over $1 million in bounties to its network of developers. Datacurve has secured $17.7 million in total funding, highlighted by a $15 million Series A round led by Chemistry, alongside investments from Y Combinator, Afore Capital, and Balaji Srinivasan. Providing post-training data collection services to major artificial intelligence firms, the enterprise was founded in 2024 by Serena Ge and Charley Lee.
Datacurve has raised $18.5M across 3 funding rounds. Most recently, it raised $15.0M Series A in October 2025.
| Date | Round | Lead Investors | Other Investors | Status |
|---|---|---|---|---|
| Oct 1, 2025 | $15M Series A | Chemistry VC | Afore Capital, SignalFire, Wing Venture Capital | Announced |
| Mar 1, 2025 | $3M Seed | — | NIR Eyal, TOM Blomfield | Announced |
| Feb 1, 2024 | $500K Seed | — | Afore Capital, Andreessen Horowitz, Bullish, Comal Ventures, Inovia Capital, Jude Gomila Rolling Fund, Northside Ventures, Weekend Fund, Wing Venture Capital, Y Combinator, ED Baker | Announced |
Key people at Datacurve.
Datacurve was founded in 2024 by Charley Lee (Founder) and Serena Ge (Founder).
Datacurve has raised $18.5M in total across 3 funding rounds.
Datacurve's investors include Chemistry VC, Afore Capital, SignalFire, Wing Venture Capital, Nir Eyal, Tom Blomfield, Andreessen Horowitz, Bullish, Comal Ventures, iNovia Capital, Jude Gomila Rolling Fund, Northside Ventures.
Datacurve is a specialized data factory that creates high-quality coding datasets specifically designed for training and evaluating large language models (LLMs) focused on software development tasks. It serves AI companies and research labs by identifying weaknesses in their models through private benchmarks and then orchestrating targeted data collection projects via a gamified bounty platform where over 14,000 vetted software engineers compete to produce complex coding data. This approach addresses the growing need for expert-level, domain-specific data beyond generic labeling services, enabling improved model performance in coding tasks such as algorithm challenges, debugging, and multimodal UI understanding. Datacurve’s business model is B2B, generating revenue from custom dataset contracts tailored to specific model weaknesses, thus impacting the AI startup ecosystem by providing critical infrastructure for advanced model training and evaluation[1][2][3].
Founded recently with a seed round followed by a $15 million Series A led by Chemistry and notable investors from DeepMind, Anthropic, and OpenAI, Datacurve was co-founded by Serena Ge and Charley Lee. The founders recognized the increasing complexity of AI training data needs, especially for software engineering tasks that require deep expertise. They developed a unique “bounty hunter” system to attract skilled engineers by gamifying data creation, focusing on user experience rather than just financial incentives. This model emerged from the observation that as AI models mature, the remaining data gaps are highly specialized and require expert contributions, which traditional crowd-sourcing cannot efficiently fill. Early traction includes distributing over $1 million in bounties and building a platform that integrates seamlessly with major ML training pipelines[1][3].
Datacurve rides the trend of increasing specialization and sophistication in AI training data, particularly for coding and software development models. As LLMs evolve, simple datasets no longer suffice; complex reinforcement learning environments and domain-specific data are essential. The timing is critical because the AI industry is shifting from broad pretraining to targeted post-training data collection to address nuanced model failures. Datacurve’s approach influences the ecosystem by setting new standards for data quality and developer engagement, potentially expanding beyond software engineering into other expert domains like finance or medicine. Its platform also exemplifies how gamification and expert networks can solve the challenge of sourcing high-quality, specialized training data at scale[1][3].
Datacurve is positioned to become a key infrastructure provider for next-generation AI coding models by scaling its expert-driven data factory and expanding its bounty platform. Future trends shaping its journey include the growing demand for reinforcement learning from human feedback (RLHF) data, multimodal AI capabilities, and the need for proprietary, realistic codebases in training. As AI models become more agentic and interactive, Datacurve’s ability to produce complex, scenario-based datasets will be increasingly valuable. Its influence may grow by extending its model to other specialized fields and by deepening integration with AI research workflows, potentially becoming a cornerstone in the AI data supply chain. This aligns with its mission to scale the future of AI coding abilities through quality and innovation[1][2][3].