FriendliAI — founded by the researcher behind continuous batching, the technique at the core of vLLM — is launching InferenceSense, a platform that fills idle neocloud GPU capacity with paid AI ...
While previous embedding models were largely restricted to text, this new model natively integrates text, images, video, audio, and documents into a single numerical space — reducing latency by as muc ...