Select Page
To Build a Better AI Supercomputer, Let There Be Light

To Build a Better AI Supercomputer, Let There Be Light

GlobalFoundries, a company that makes chips for others, including AMD and General Motors, previously announced a partnership with Lightmatter. Harris says his company is “working with the largest semiconductor companies in the world as well as the hyperscalers,” referring to the largest cloud companies like Microsoft, Amazon, and Google.

If Lightmatter or another company can reinvent the wiring of giant AI projects, a key bottleneck in the development of smarter algorithms might fall away. The use of more computation was fundamental to the advances that led to ChatGPT, and many AI researchers see the further scaling-up of hardware as being crucial to future advances in the field—and to hopes of ever reaching the vaguely-specified goal of artificial general intelligence, or AGI, meaning programs that can match or exceed biological intelligence in every way.

Linking a million chips together with light might allow for algorithms several generations beyond today’s cutting edge, says Lightmatter’s CEO Nick Harris. “Passage is going to enable AGI algorithms,” he confidently suggests.

The large data centers that are needed to train giant AI algorithms typically consist of racks filled with tens of thousands of computers running specialized silicon chips and a spaghetti of mostly electrical connections between them. Maintaining training runs for AI across so many systems—all connected by wires and switches—is a huge engineering undertaking. Converting between electronic and optical signals also places fundamental limits on chips’ abilities to run computations as one.

Lightmatter’s approach is designed to simplify the tricky traffic inside AI data centers. “Normally you have a bunch of GPUs, and then a layer of switches, and a layer of switches, and a layer of switches, and you have to traverse that tree” to communicate between two GPUs, Harris says. In a data center connected by Passage, Harris says, every GPU would have a high-speed connection to every other chip.

Lightmatter’s work on Passage is an example of how AI’s recent flourishing has inspired companies large and small to try to reinvent key hardware behind advances like OpenAI’s ChatGPT. Nvidia, the leading supplier of GPUs for AI projects, held its annual conference last month, where CEO Jensen Huang unveiled the company’s latest chip for training AI: a GPU called Blackwell. Nvidia will sell the GPU in a “superchip” consisting of two Blackwell GPUs and a conventional CPU processor, all connected using the company’s new high-speed communications technology called NVLink-C2C.

The chip industry is famous for finding ways to wring more computing power from chips without making them larger, but Nvidia chose to buck that trend. The Blackwell GPUs inside the company’s superchip are twice as powerful as their predecessors but are made by bolting two chips together, meaning they consume much more power. That trade-off, in addition to Nvidia’s efforts to glue its chips together with high-speed links, suggests that upgrades to other key components for AI supercomputers, like that proposed by Lightmatter, could become more important.

The NSA Warns That US Adversaries Free to Mine Private Data May Have an AI Edge

The NSA Warns That US Adversaries Free to Mine Private Data May Have an AI Edge

Electrical engineer Gilbert Herrera was appointed research director of the US National Security Agency in late 2021, just as an AI revolution was brewing inside the US tech industry.

The NSA, sometimes jokingly said to stand for No Such Agency, has long hired top math and computer science talent. Its technical leaders have been early and avid users of advanced computing and AI. And yet when Herrera spoke with me by phone about the implications of the latest AI boom from NSA headquarters in Fort Meade, Maryland, it seemed that, like many others, the agency has been stunned by the recent success of the large language models behind ChatGPT and other hit AI products. The conversation has been lightly edited for clarity and length.

Person in a suit smiling in front of the American and National Security Agency flags

Gilbert HerreraCourtesy of National Security Agency

How big of a surprise was the ChatGPT moment to the NSA?

Oh, I thought your first question was going to be “what did the NSA learn from the Ark of the Covenant?” That’s been a recurring one since about 1939. I’d love to tell you, but I can’t.

What I think everybody learned from the ChatGPT moment is that if you throw enough data and enough computing resources at AI, these emergent properties appear.

The NSA really views artificial intelligence as at the frontier of a long history of using automation to perform our missions with computing. AI has long been viewed as ways that we could operate smarter and faster and at scale. And so we’ve been involved in research leading to this moment for well over 20 years.

Large language models have been around long before generative pretrained (GPT) models. But this “ChatGPT moment”—once you could ask it to write a joke, or once you can engage in a conversation—that really differentiates it from other work that we and others have done.

The NSA and its counterparts among US allies have occasionally developed important technologies before anyone else but kept it a secret, like public key cryptography in the 1970s. Did the same thing perhaps happen with large language models?

At the NSA we couldn’t have created these big transformer models, because we could not use the data. We cannot use US citizen’s data. Another thing is the budget. I listened to a podcast where someone shared a Microsoft earnings call, and they said they were spending $10 billion a quarter on platform costs. [The total US intelligence budget in 2023 was $100 billion.]

It really has to be people that have enough money for capital investment that is tens of billions and [who] have access to the kind of data that can produce these emergent properties. And so it really is the hyperscalers [largest cloud companies] and potentially governments that don’t care about personal privacy, don’t have to follow personal privacy laws, and don’t have an issue with stealing data. And I’ll leave it to your imagination as to who that may be.

Doesn’t that put the NSA—and the United States—at a disadvantage in intelligence gathering and processing?

II’ll push back a little bit: It doesn’t put us at a big disadvantage. We kind of need to work around it, and I’ll come to that.

It’s not a huge disadvantage for our responsibility, which is dealing with nation-state targets. If you look at other applications, it may make it more difficult for some of our colleagues that deal with domestic intelligence. But the intelligence community is going to need to find a path to using commercial language models and respecting privacy and personal liberties. [The NSA is prohibited from collecting domestic intelligence, although multiple whistleblowers have warned that it does scoop up US data.]

Selective Forgetting Can Help AI Learn Better

Selective Forgetting Can Help AI Learn Better

The original version of this story appeared in Quanta Magazine.

A team of computer scientists has created a nimbler, more flexible type of machine learning model. The trick: It must periodically forget what it knows. And while this new approach won’t displace the huge models that undergird the biggest apps, it could reveal more about how these programs understand language.

The new research marks “a significant advance in the field,” said Jea Kwon, an AI engineer at the Institute for Basic Science in South Korea.

The AI language engines in use today are mostly powered by artificial neural networks. Each “neuron” in the network is a mathematical function that receives signals from other such neurons, runs some calculations, and sends signals on through multiple layers of neurons. Initially the flow of information is more or less random, but through training, the information flow between neurons improves as the network adapts to the training data. If an AI researcher wants to create a bilingual model, for example, she would train the model with a big pile of text from both languages, which would adjust the connections between neurons in such a way as to relate the text in one language with equivalent words in the other.

But this training process takes a lot of computing power. If the model doesn’t work very well, or if the user’s needs change later on, it’s hard to adapt it. “Say you have a model that has 100 languages, but imagine that one language you want is not covered,” said Mikel Artetxe, a coauthor of the new research and founder of the AI startup Reka. “You could start over from scratch, but it’s not ideal.”

Artetxe and his colleagues have tried to circumvent these limitations. A few years ago, Artetxe and others trained a neural network in one language, then erased what it knew about the building blocks of words, called tokens. These are stored in the first layer of the neural network, called the embedding layer. They left all the other layers of the model alone. After erasing the tokens of the first language, they retrained the model on the second language, which filled the embedding layer with new tokens from that language.

Even though the model contained mismatched information, the retraining worked: The model could learn and process the new language. The researchers surmised that while the embedding layer stored information specific to the words used in the language, the deeper levels of the network stored more abstract information about the concepts behind human languages, which then helped the model learn the second language.

“We live in the same world. We conceptualize the same things with different words” in different languages, said Yihong Chen, the lead author of the recent paper. “That’s why you have this same high-level reasoning in the model. An apple is something sweet and juicy, instead of just a word.”

Air Canada Has to Honor a Refund Policy Its Chatbot Made Up

Air Canada Has to Honor a Refund Policy Its Chatbot Made Up

After months of resisting, Air Canada was forced to give a partial refund to a grieving passenger who was misled by an airline chatbot inaccurately explaining the airline’s bereavement travel policy.

On the day Jake Moffatt’s grandmother died, Moffat immediately visited Air Canada’s website to book a flight from Vancouver to Toronto. Unsure of how Air Canada’s bereavement rates worked, Moffatt asked Air Canada’s chatbot to explain.

The chatbot provided inaccurate information, encouraging Moffatt to book a flight immediately and then request a refund within 90 days. In reality, Air Canada’s policy explicitly stated that the airline will not provide refunds for bereavement travel after the flight is booked. Moffatt dutifully attempted to follow the chatbot’s advice and request a refund but was shocked that the request was rejected.

Moffatt tried for months to convince Air Canada that a refund was owed, sharing a screenshot from the chatbot that clearly claimed:

If you need to travel immediately or have already travelled and would like to submit your ticket for a reduced bereavement rate, kindly do so within 90 days of the date your ticket was issued by completing our Ticket Refund Application form.

Air Canada argued that because the chatbot response elsewhere linked to a page with the actual bereavement travel policy, Moffatt should have known bereavement rates could not be requested retroactively. Instead of a refund, the best Air Canada would do was to promise to update the chatbot and offer Moffatt a $200 coupon to use on a future flight.

Unhappy with this resolution, Moffatt refused the coupon and filed a small claims complaint in Canada’s Civil Resolution Tribunal.

According to Air Canada, Moffatt never should have trusted the chatbot and the airline should not be liable for the chatbot’s misleading information because, Air Canada essentially argued, “the chatbot is a separate legal entity that is responsible for its own actions,” a court order said.

Experts told the Vancouver Sun that Moffatt’s case appeared to be the first time a Canadian company tried to argue that it wasn’t liable for information provided by its chatbot.

Tribunal member Christopher Rivers, who decided the case in favor of Moffatt, called Air Canada’s defense “remarkable.”

“Air Canada argues it cannot be held liable for information provided by one of its agents, servants, or representatives—including a chatbot,” Rivers wrote. “It does not explain why it believes that is the case” or “why the webpage titled ‘Bereavement travel’ was inherently more trustworthy than its chatbot.”

Further, Rivers found that Moffatt had “no reason” to believe that one part of Air Canada’s website would be accurate and another would not.

Air Canada “does not explain why customers should have to double-check information found in one part of its website on another part of its website,” Rivers wrote.

In the end, Rivers ruled that Moffatt was entitled to a partial refund of $650.88 in Canadian dollars off the original fare (about $482 USD), which was $1,640.36 CAD (about $1,216 USD), as well as additional damages to cover interest on the airfare and Moffatt’s tribunal fees.

Air Canada told Ars it will comply with the ruling and considers the matter closed.

Air Canada’s Chatbot Appears to Be Disabled

When Ars visited Air Canada’s website on Friday, there appeared to be no chatbot support available, suggesting that Air Canada has disabled the chatbot.

Air Canada did not respond to Ars’ request to confirm whether the chatbot is still part of the airline’s online support offerings.

OpenAI and Other Tech Giants Will Have to Warn the US Government When They Start New AI Projects

OpenAI and Other Tech Giants Will Have to Warn the US Government When They Start New AI Projects

When OpenAI’s ChatGPT took the world by storm last year, it caught many power brokers in both Silicon Valley and Washington, DC, by surprise. The US government should now get advance warning of future AI breakthroughs involving large language models, the technology behind ChatGPT.

The Biden administration is preparing to use the Defense Production Act to compel tech companies to inform the government when they train an AI model using a significant amount of computing power. The rule could take effect as soon as next week.

The new requirement will give the US government access to key information about some of the most sensitive projects inside OpenAI, Google, Amazon, and other tech companies competing in AI. Companies will also have to provide information on safety testing being done on their new AI creations.

OpenAI has been coy about how much work has been done on a successor to its current top offering, GPT-4. The US government may be the first to know when work or safety testing really begins on GPT-5. OpenAI did not immediately respond to a request for comment.

“We’re using the Defense Production Act, which is authority that we have because of the president, to do a survey requiring companies to share with us every time they train a new large language model, and share with us the results—the safety data—so we can review it,” Gina Raimondo, US secretary of commerce, said Friday at an event held at Stanford University’s Hoover Institution. She did not say when the requirement will take effect or what action the government might take on the information it received about AI projects. More details are expected to be announced next week.

The new rules are being implemented as part of a sweeping White House executive order issued last October. The executive order gave the Commerce Department a deadline of January 28 to come up with a scheme whereby companies would be required to inform US officials of details about powerful new AI models in development. The order said those details should include the amount of computing power being used, information on the ownership of data being fed to the model, and details of safety testing.

The October order calls for work to begin on defining when AI models should require reporting to the Commerce Department but sets an initial bar of 100 septillion (a million billion billion or 1026) floating-point operations per second, or flops, and a level 1,000 times lower for large language models working on DNA sequencing data. Neither OpenAI nor Google have disclosed how much computing power they used to train their most powerful models, GPT-4 and Gemini, respectively, but a congressional research service report on the executive order suggests that 1026 flops is slightly beyond what was used to train GPT-4.

Raimondo also confirmed that the Commerce Department will soon implement another requirement of the October executive order requiring cloud computing providers such as Amazon, Microsoft, and Google to inform the government when a foreign company uses their resources to train a large language model. Foreign projects must be reported when they cross the same initial threshold of 100 septillion flops.