ํ”ผ๋“œ๋กœ ๋Œ์•„๊ฐ€๊ธฐ
Introducing Three New Serverless Inference Providers: Hyperbolic, Nebius AI Studio, and Novita ๐Ÿ”ฅ
Hugging Face BlogHugging Face Blog
Backend

Hugging Face Hub๊ฐ€ Hyperbolic, Nebius AI Studio, Novita 3๊ฐœ serverless inference provider๋ฅผ ์ถ”๊ฐ€๋กœ ์ง€์›ํ•จ์œผ๋กœ์จ ๋ชจ๋ธ ํŽ˜์ด์ง€์—์„œ ์ง์ ‘ DeepSeek-R1, Flux.1 ๋“ฑ ๋‹ค์–‘ํ•œ ๋ชจ๋ธ ์ ‘๊ทผ ๊ฐ€๋Šฅ

Introducing Three New Serverless Inference Providers: Hyperbolic, Nebius AI Studio, and Novita ๐Ÿ”ฅ

2025๋…„ 2์›” 18์ผ7๋ถ„intermediate

Context

Hugging Face Hub๋Š” ๊ธฐ์กด์— Together AI, Sambanova, Replicate, fal, Fireworks.ai 5๊ฐœ์˜ serverless inference provider๋งŒ ์ง€์›ํ•˜๊ณ  ์žˆ์—ˆ์œผ๋ฉฐ, ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ๋ชจ๋ธ๊ณผ ๊ณต๊ธ‰์ž ์„ ํƒ์ง€๊ฐ€ ์ œํ•œ์ ์ด์—ˆ๋‹ค.

Technical Solution

  • 3๊ฐœ ์‹ ๊ทœ serverless inference provider ํ†ตํ•ฉ: Hyperbolic, Nebius AI Studio, Novita๋ฅผ Hub ๋ชจ๋ธ ํŽ˜์ด์ง€์— ์ง์ ‘ ์—ฐ๋™
  • ์ด์ค‘ ์ธ์ฆ ๋ฐฉ์‹ ๊ตฌํ˜„: ์‚ฌ์šฉ์ž ์ž์‹ ์˜ API ํ‚ค๋ฅผ ์‚ฌ์šฉํ•œ ์ง์ ‘ ํ˜ธ์ถœ(Custom key) ๋˜๋Š” Hugging Face ๊ณ„์ •์„ ํ†ตํ•œ ๋ผ์šฐํŒ…(Routed by HF) ์„ ํƒ ๊ฐ€๋Šฅ
  • ์‚ฌ์šฉ์ž ์„ค์ •์—์„œ provider ์ˆœ์„œ ์ง€์ • ๊ธฐ๋Šฅ ์ œ๊ณต: ๋ชจ๋ธ ํŽ˜์ด์ง€ ์œ„์ ฏ๊ณผ ์ฝ”๋“œ ์Šค๋‹ˆํŽซ์—์„œ ์„ ํ˜ธํ•˜๋Š” provider ์ˆœ์„œ๋Œ€๋กœ ํ‘œ์‹œ
  • Python huggingface_hub SDK์—์„œ InferenceClient์˜ provider ํŒŒ๋ผ๋ฏธํ„ฐ๋กœ provider๋ช… ์ง€์ •: InferenceClient(provider="hyperbolic", api_key="...")
  • JavaScript @huggingface/inference SDK์—์„œ chatCompletion ํ˜ธ์ถœ ์‹œ provider ํŒŒ๋ผ๋ฏธํ„ฐ ์ถ”๊ฐ€: await client.chatCompletion({model: "...", provider: "novita", ...})
  • ์ฒญ๊ตฌ ๋ชจ๋ธ ์ด์›ํ™”: Custom key ์‚ฌ์šฉ ์‹œ ํ•ด๋‹น provider ๊ณ„์ •์œผ๋กœ ์ฒญ๊ตฌ, Routed by HF ๋ฐฉ์‹ ์‹œ Hugging Face ๊ณ„์ •์œผ๋กœ ์ฒญ๊ตฌ

Impact

PRO ์‚ฌ์šฉ์ž๋Š” ๋งค์›” $2 ์ƒ๋‹น์˜ Inference credits ์ œ๊ณต๋ฐ›์Œ.

Key Takeaway

Serverless inference provider๋ฅผ ์ถ”์ƒํ™” ๊ณ„์ธต์œผ๋กœ ํ†ตํ•ฉํ•˜๋ฉด ์‚ฌ์šฉ์ž๊ฐ€ ๋™์ผํ•œ SDK ์ฝ”๋“œ ๊ตฌ์กฐ ์†์—์„œ provider๋ช…๋งŒ ๋ณ€๊ฒฝํ•˜์—ฌ ์œ ์—ฐํ•˜๊ฒŒ ์„œ๋น„์Šค๋ฅผ ์ „ํ™˜ํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์ฒญ๊ตฌ ๋ฐฉ์‹ ์„ ํƒ์ง€๋ฅผ ์ œ๊ณตํ•จ์œผ๋กœ์จ ๋น„์šฉ ์ตœ์ ํ™”์™€ ์˜์กด์„ฑ ์ œ์–ด๋ฅผ ๋™์‹œ์— ๋‹ฌ์„ฑํ•  ์ˆ˜ ์žˆ๋‹ค.


LLM API๋ฅผ ํ†ตํ•ฉํ•˜๋Š” ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์—์„œ provider๋ฅผ ํŒŒ๋ผ๋ฏธํ„ฐํ™”๋œ ํด๋ผ์ด์–ธํŠธ ์ถ”์ƒํ™”๋กœ ๊ฐ์‹ธ๋ฉด, InferenceClient(provider=๋ณ€์ˆ˜๋ช…)์ฒ˜๋Ÿผ ๋‹จ ํ•œ ๊ณณ์˜ ์„ค์ • ๋ณ€๊ฒฝ๋งŒ์œผ๋กœ ์—ฌ๋Ÿฌ ์„œ๋“œํŒŒํ‹ฐ inference provider๋ฅผ ์ „ํ™˜ํ•  ์ˆ˜ ์žˆ์–ด ๊ณต๊ธ‰์ž ์ข…์†์„ฑ์„ ์ค„์ด๊ณ  ๋ชจ๋ธ ๊ฐ€์šฉ์„ฑ์„ ๋†’์ผ ์ˆ˜ ์žˆ๋‹ค.

์›๋ฌธ ์ฝ๊ธฐ