Intelligent Computing System is a combination of Deep Learning, Parallel Programming, Computer Organization and Computer Architecture.
Neural Network Basis
Loss function \(L(w)=\frac{1}{2}\sum_i(H(x_i)-y_i)^2=\frac{1}{2}\sum_i(w^Tx-y_i)\)
Gradient Descent: \(w=w-\alpha\frac{\partial L(w)}{\partial w}\)
Activate Function
Back Propagation: Chain Rule \(\frac{\partial L}{\partial w}=\frac{\partial W}{\partial y}\frac{\partial y}{\partial z}\frac{\partial z}{\partial w}\)
Neural Network structure: input layer, latent layer, output layer
CNN
convolution layer
pooling
fully connect + softmax \(f(z_j)=\frac{e^{z_j}}{\sum_ie^{z_i}}\)
z.B. alexnet, VGG, Inception, ResNet
How to judge CNN?
- IoU aka Jaccard index 交并比
\(IoU=\frac{A\bigcap B}{A\bigcup B}\)
if IoU>0.5, location accepted.
- mAP aka mean average precision
mAP\(=\frac{\sum_{q=1}^QAveP(q)}{Q}\)
recall\(=\frac{TP}{TP+FN}\)
precision\(=\frac{TP}{TP+FP}\)
Object detective
R-CNN, YOLO
RNN
sequence, recurrent, memory
LSTM
GRU
GAN
generator, judger
CGAN, ConditionGAN
Deep Learning Framework
Tensorflow
Computation are expressed as stateful dataflow graphs.
All data is modelled as Tensor.
Computing operations running in Session.
Asynchronization execute stateful data flow graph through Queue.
Automatic differentiation
PyTorch
- flexible
- Python and C++
- In research area
MXNet
- R, Julia, Go
- Efficiency & flexibility
Caffe
- The earlist
- lack flexibility
- No longer maintain
Deep Learning Processor
aka deep learning accelerator
DLP is an electronic circuit designed for deep learning algorithms, usually with separate data memory and dedicated instruction set architecture.
Aim to optimize:
- Data level parallel
- Vectorize operation
DLP Instruction Set
Other accelerator
-
GPU
-
FPGA
Deep Learning Language
Heterogeneous computing 异构计算
- Task division
- Data distribution
- Data communication
- Parallel and Synchronization
How to develop a new operator? 如何开发一个新算子