The design can run a big neural network more efficiently than banks of GPUs wired together. But manufacturing and running the chip is a challenge, requiring new methods for etching silicon features, a design that includes redundancies to account for manufacturing flaws, and a novel water system to keep the giant chip chilled.
To build a cluster of WSE-2 chips capable of running AI models of record size, Cerebras had to solve another engineering challenge: how to get data in and out of the chip efficiently. Regular chips have their own memory on board, but Cerebras developed an off-chip memory box called MemoryX. The company also created software that allows a neural network to be partially stored in that off-chip memory, with only the computations shuttled over to the silicon chip. And it built a hardware and software system called SwarmX that wires everything together.
“They can improve the scalability of training to huge dimensions, beyond what anybody is doing today,” says Mike Demler, a senior analyst with the Linley Group and a senior editor of The Microprocessor Report.
Demler says it isn’t yet clear how much of a market there will be for the cluster, especially since some potential customers are already designing their own, more specialized chips in-house. He adds that the real performance of the chip, in terms of speed, efficiency, and cost, are as yet unclear. Cerebras hasn’t published any benchmark results so far.
“There’s a lot of impressive engineering in the new MemoryX and SwarmX technology,” Demler says. “But just like the processor, this is highly specialized stuff; it only makes sense for training the very largest models.”
Cerebras’ chips have so far been adopted by labs that need supercomputing power. Early customers include Argonne National Labs, Lawrence Livermore National Lab, pharma companies including GlaxoSmithKline and AstraZeneca, and what Feldman describes as “military intelligence” organizations.
This shows that the Cerebras chip can be used for more than just powering neural networks; the computations these labs run involve similarly massive parallel mathematical operations. “And they’re always thirsty for more compute power,” says Demler, who adds that the chip could conceivably become important for the future of supercomputing.
David Kanter, an analyst with Real World Technologies and executive director of MLCommons, an organization that measures the performance of different AI algorithms and hardware, says he sees a future market for much bigger AI models. “I generally tend to believe in data-centric ML [machine learning], so we want larger data sets that enable building larger models with more parameters,” Kanter says.
A new video from human rights organization Amnesty International maps the locations of more than 15,000 cameras used by the New York Police Department, both for routine surveillance and in facial-recognition searches. A 3D model shows the 200-meter range of a camera, part of a sweeping dragnet capturing the unwitting movements of nearly half of the city’s residents, putting them at risk for misidentification. The group says it is the first to map the locations of that many cameras in the city.
Amnesty International and a team of volunteer researchers mapped cameras that can feed NYPD’s much criticized facial-recognition systems in three of the city’s five boroughs—Manhattan, Brooklyn, and the Bronx—finding 15,280 in total. Brooklyn is the most surveilled, with over 8,000 cameras.
“You are never anonymous,” says Matt Mahmoudi, the AI researcher leading the project. The NYPD has used the cameras in almost 22,000 facial-recognition searches since 2017, according to NYPD documents obtained by the Surveillance Technology Oversight Project, a New York privacy group.
“Whether you’re attending a protest, walking to a particular neighborhood, or even just grocery shopping, your face can be tracked by facial-recognition technology using imagery from thousands of camera points across New York,” Mahmoudi says.
The cameras are often placed on top of buildings, on street lights, and at intersections. The city itself owns thousands of cameras; in addition, private businesses and homeowners often grant access to police.
Police can compare faces captured by these cameras to criminal databases to search for potential suspects. Earlier this year, the NYPD was required to disclose the details of its facial-recognition systems for public comment. But those disclosures didn’t include the number or location of cameras, or any details of how long data is retained or with whom data is shared.
The Amnesty International team found that the cameras are often clustered in majority nonwhite neighborhoods. NYC’s most surveilled neighborhood is East New York, Brooklyn, where the group found 577 cameras in less than 2 square miles. More than 90 percent of East New York’s residents are nonwhite, according to city data.
Facial-recognition systems often perform less accurately on darker-skinned people than lighter-skinned people. In 2016, Georgetown University researchers found that police departments across the country used facial recognition to identify nonwhite potential suspects more than their white counterparts.
In a statement, an NYPD spokesperson said the department never arrests anyone “solely on the basis of a facial-recognition match,” and only uses the tool to investigate “a suspect or suspects related to the investigation of a particular crime.”
“Where images are captured at or near a specific crime, comparison of the image of a suspect can be made against a database that includes only mug shots legally held in law enforcement records based on prior arrests,” the statement reads.
Amnesty International is releasing the map and accompanying videos as part of its #BantheScan campaign urging city officials to ban police use of the tool ahead of the city’s mayoral primary later this month. In May, Vice asked mayoral candidates if they’d support a ban on facial recognition. While most didn’t respond to the inquiry, candidate Dianne Morales told the publication she supported a ban, while candidates Shaun Donovan and Andrew Yang suggested auditing for disparate impact before deciding on any regulation.
In recent years, researchers have used artificial intelligence to improve translation between programming languages or automatically fix problems. The AI system DrRepair, for example, has been shown to solve most issues that spawn error messages. But some researchers dream of the day when AI can write programs based on simple descriptions from non-experts.
On Tuesday, Microsoft and OpenAI shared plans to bring GPT-3, one of the world’s most advanced models for generating text, to programming based on natural language descriptions. This is the first commercial application of GPT-3 undertaken since Microsoft invested $1 billion in OpenAI last year and gained exclusive licensing rights to GPT-3.
“If you can describe what you want to do in natural language, GPT-3 will generate a list of the most relevant formulas for you to choose from,” said Microsoft CEO Satya Nadella in a keynote address at the company’s Build developer conference. “The code writes itself.”
Microsoft VP Charles Lamanna told WIRED the sophistication offered by GPT-3 can help people tackle complex challenges and empower people with little coding experience. GPT-3 will translate natural language into PowerFx, a fairly simple programming language similar to Excel commands that Microsoft introduced in March.
This is the latest demonstration of applying AI to coding. Last year at Microsoft’s Build, OpenAI CEO Sam Altman demoed a language model fine-tuned with code from GitHub that automatically generates lines of Python code. As WIRED detailed last month, startups like SourceAI are also using GPT-3 to generate code. IBM last month showed how its Project CodeNet, with 14 million code samples from more than 50 programming languages, could reduce the time needed to update a program with millions of lines of Java code for an automotive company from one year to one month.
Microsoft’s new feature is based on a neural network architecture known as Transformer, used by big tech companies including Baidu, Google, Microsoft, Nvidia, and Salesforce to create large language models using text training data scraped from the web. These language models continually grow larger. The largest version of Google’s BERT, a language model released in 2018, had 340 million parameters, a building block of neural networks. GPT-3, which was released one year ago, has 175 billion parameters.
Such efforts have a long way to go, however. In one recent test, the best model succeeded only 14 percent of the time on introductory programming challenges compiled by a group of AI researchers.