According to China news of the network that it is an official, china for will now day (on August 23) holding Ascend910AI processor and MindSpore to open news briefing of source computation frame in Shenzhen. China to express, industry will be rolled out on the news briefing of tomorrow up to now the AI computation frame of the AI processor with the fastest performance and full-court scene.
China also announce for the government, this news briefing will by China for spell president Xu Zhijun releases Ascend910AI processor and MindSpore computation frame. China for spell president bolt of article of party of division of framework of Xu Zhijun, chief strategy, chip is mixed Gu Yongli of general manager of department of product of BUEI of Ai Wei of hardware strategy Fellow, cloud participates in interlocution link.
Antedate to linked congress to go up in Hua Weiquan in October 2018, china for spell president Xu Zhijun elaborated AI strategy first, announced formally to vacate 910 vacate chip of two 310 AI with . Xu Zhijun expresses, is vacated 910 it is at that time the chip with the biggest density of odd chip computation.
According to report of fast science and technology, the industry that kicks off in this week on HotChips of top class conference, hua Weiceng is brief the partial detail that introduced Asend910.
The PPT on the meeting shows, asend910 is based on Da Vinci (DaVinci) core framework, use 7nm to enhance edition EUV craft to make, odd Die is built-in core of 32 Da Vinci, semi-finish grinding is as high as 256TFOPs, power comsumption 350W. The operation density of Ascend910 surmounted contest to taste NVIDIATeslaV100 and Gu Ge TPUv3, china the AI operation server that has 2048 node to still was designed, integral function amounts to 512PetaFlops(2048×256) .
Occupy additionally China for information of official small letter, da Vinci basically is calculated by vector of the 3DCube of core, Vector the composition such as unit of computation of scalar quantity of unit, Scalar, 3DCube is done in the light of matrix operation quickly, promote the AI below unit power comsumption to calculate force considerably, every AICore is OK in the implementation inside cycle of a clock 4096 MAC are operated. In the meantime, bufferL0A, L0B, L0C is used at memory to input matrix and output matrix data, be in charge of be being carried to Cube computation unit data and deposit computational result.