X Feed Intel

789

Relevant

273

Topics

2290

Total Posts

$1.633

Cost This Week

$1.633

Total Cost

2026-02-23T23:00

Last Fetch

← Back to Topics

Inference Stack

WebSocket optimization for agentic tool-calling latency reduction

Responses API WebSocket implementation achieving 30-40% latency improvements in agent tool-calling workflows, representing inference stack optimization for agentic workloads.

8 posts · First seen 2026-02-23 · Last activity 2026-02-23

Time	Author	Post
2026-02-23T22:20	@OpenAIDevs	Teams are using WebSockets in the Responses API to speed up agentic workflows https://t.co/CNKDniYfe1
2026-02-23T21:23	@migtissera	RT @OpenAIDevs: Introducing WebSockets in the Responses API. Built for low-latency, long-running agents with heavy tool calls. https://t.…
2026-02-23T21:21	@martin_casado	RT @leerob: All OpenAI models in Cursor are now up to 30% faster! We've upgraded all users to WebSockets with their Responses API.
2026-02-23T21:14	@romainhuet	We built WebSockets support to keep up with the speed of GPT-5.3-Codex-Spark! ✨ Excited to make this available to everyone building on the platform. https://t.co/QtixmgnVQY
2026-02-23T20:16	@stevenheidel	WebSockets are the reason we were able to speed up Codex recently - across all models https://t.co/BzC6uVBz3u ↩ reply parent
2026-02-23T20:10	@stevenheidel	the Responses API now supports WebSockets! this can make your agents run 30-40% faster, especially when they make a lot of tool calls https://t.co/sBgoat2gsX https://t.co/pTPaqlKPvl
2026-02-23T20:04	@OpenAIDevs	WebSockets keep a persistent connection to the Responses API, allowing you to send only new inputs instead of round-tripping the entire context on every turn. By maintaining in-memory state across interactions, it avoids repeated work and speeds up agentic runs with 20+ tool calls by 20%-40%. ↩ reply parent
2026-02-23T20:04	@OpenAIDevs	Introducing WebSockets in the Responses API. Built for low-latency, long-running agents with heavy tool calls. https://t.co/qmOAhidk7o https://t.co/feiGpewQaE