Richard Ho, the company’s chief of hardware, made the statements during his keynote presentation at the AI Infra Summit in Santa Clara.
Ho added, “It has to be built into the hardware. Today, most safety work is done in software. It thinks your hardware is secure. It presupposes that your hardware will perform correctly. It assumes you can unplug the hardware. He is not implying that we cannot disconnect that hardware, but rather that these devices are devious, the models are very devious; and as a hardware enthusiast, he wants to be certain of that.
Safety precautions at the silicon level
Ho stated that the rise of generative AI necessitates a rethinking of system design and outlined how future agents would be long-lived, engaging in the background even when a user is not actively involved.
This transition necessitates memory-rich, low-latency infrastructure to support continuous sessions and communication between several agents.
Ho said that networking is turning into a bottleneck. “We will need real-time tools in these, which means that these agents will be able to communicate with one another.” A few of them may be examining a tool, while others may be searching a webpage. While others are thinking, others must converse with one another.
Limits on high-bandwidth memory, the demand for 2.5D and 3D chip integration, advancements in optics, and excessive power requirements that might exceed 1 megawatt per rack are just a few of the hardware issues that Ho listed as needing to be resolved.
Secure execution routes in CPUs and accelerators, telemetry to identify indications of anomalous behavior, and real-time kill switches integrated into AI clusters are some of the safety precautions proposed by OpenAI.
Ho concluded by stating that we need adequate benchmarks for hardware and agent-aware designs, and he believes it is crucial to understand latency walls and latency tails, as well as their power and efficiency. In addition to being a debug tool, we must have high observability as a characteristic of our hardware that continuously monitors it.
Networking is a really crucial issue, and as we move into optical, it’s uncertain if the network is reliable now. We must ensure that we have sufficient testing of these optical testbeds and other communication testbeds to demonstrate our reliability.






