In 2019, when Microsoft Corporation invested $1 billion in OpenAI, the developer of ChatGPT, “OpenAI,” Microsoft agreed to build a supercomputer to run artificial intelligence research, but the only problem was, according to an article written by author Tina Bass, published. In the agency Bloomberg“Not sure Microsoft can build something big on its Azure cloud without crashing.”
At the time, according to Bass’ article, OpenAI trained a large group of AI programs, called models, that consumed large amounts of data and learned more and more variables that the AI system extracted through training and retraining.
This meant that OpenAI needed access to powerful cloud computing services for long periods of time and at constant speeds.
Manufacturing and assembly challenges
To meet this challenge, Microsoft had to find ways to assemble tens of thousands of A100 graphics chips from Nvidia Corporation, the backbone of artificial intelligence training models. To develop.
“We’ve built a system architecture that works and can be trusted at scale, and that’s what made ChatGPT possible,” said Niti Gable, director of infrastructure at Microsoft.
When OpenAI or Microsoft is training a large AI model, where millions of computational operations are all happening simultaneously, according to Bass, “units need to talk to each other to share and learn from the work they’ve done.”
With all the hardware running at the same time, Microsoft had to think about where to put the chips and where to find the power supply.
Alistair Spears, Azure’s global infrastructure manager, said the company had to make sure it could cool all of these machines and chips.
This technology allowed OpenAI to launch ChatGPT, which attracted more than 1 million users within days of its November IPO, and is now being pulled into other companies’ business models.
Among those interested in AI are credit hedgers, banking services and food delivery services, Bass says.
But in the future, according to Bass, this will put more pressure on cloud service providers like Microsoft, Amazon and Google to ensure that their data centers can provide the massive computing power required.
A cloud service depends on thousands of different parts and components, and a delay or lack of any one component, however small, can affect everything.
“Professional coffee fan. Total beer nerd. Hardcore reader. Alcohol fanatic. Evil twitter buff. Friendly tv scholar.”