FPGA Implementation of Convolutional Neural Network for Real-Time Handwriting Recognition

Shichen Qiao; Haining Qiu; Lingkai Zhao; Qikun Liu; Eric J. Hoffman

リアルタイム手書き認識のための畳み込みニューラルネットワークの FPGA 実装

機械学習 (ML) は、コンピューターサイエンスの分野で最近急成長しています。コンピューターハードウェアエンジニアとして、私たちはパフォーマンス、信頼性、リソース使用量を最適化するために、一般的なソフトウェア ML アーキテクチャのハードウェア実装に熱心に取り組んでいます。このプロジェクトでは、アルテラ DE1 FPGA キットを使用して、手書きの文字と数字を認識するための高度に構成可能なリアルタイムデバイスを設計しました。このプロジェクトを達成するために、IEEE-754 32 ビット浮動小数点標準、ビデオグラフィックスアレイ (VGA) 表示プロトコル、ユニバーサル非同期送受信機 (UART) プロトコル、集積回路間 (I2C) プロトコルなどのさまざまなエンジニアリング標準に従いました。目標。これらにより、互換性、再利用性、検証の簡素化において設計が大幅に改善されました。これらの標準に従って、32 ビット浮動小数点 (FP) 命令セットアーキテクチャ (ISA) を設計しました。画像処理、行列乗算、ML 分類、およびユーザーインターフェイスを管理するために、System Verilog で 5 段階の RISC プロセッサを開発しました。私たちの設計では、3 つの異なる ML アーキテクチャが実装され、評価されました。線形分類 (LC)、784-64-10 の完全接続ニューラルネットワーク (NN)、および ReLU アクティベーションレイヤーと 36 のクラスを備えた LeNet のような畳み込みニューラルネットワーク (CNN) です。 (数字の場合は 10、大文字と小文字を区別しない文字の場合は 26)。トレーニングプロセスは Python スクリプトで実行され、結果のカーネルと重みは 16 進ファイルに保存され、FPGA の SRAM ユニットにロードされました。畳み込み、プーリング、データ管理、その他のさまざまな ML 機能は、カスタムアセンブリ言語のファームウェアによってガイドされていました。このペーパーでは、高レベルの設計ブロック図、各 System Verilog モジュール間のインターフェイス、ソフトウェアおよびファームウェアコンポーネントの実装の詳細、および潜在的な影響についてのさらなる議論を文書化します。

Machine Learning (ML) has recently been a skyrocketing field in Computer Science. As computer hardware engineers, we are enthusiastic about hardware implementations of popular software ML architectures to optimize their performance, reliability, and resource usage. In this project, we designed a highly-configurable, real-time device for recognizing handwritten letters and digits using an Altera DE1 FPGA Kit. We followed various engineering standards, including IEEE-754 32-bit Floating-Point Standard, Video Graphics Array (VGA) display protocol, Universal Asynchronous Receiver-Transmitter (UART) protocol, and Inter-Integrated Circuit (I2C) protocols to achieve the project goals. These significantly improved our design in compatibility, reusability, and simplicity in verifications. Following these standards, we designed a 32-bit floating-point (FP) instruction set architecture (ISA). We developed a 5-stage RISC processor in System Verilog to manage image processing, matrix multiplications, ML classifications, and user interfaces. Three different ML architectures were implemented and evaluated on our design: Linear Classification (LC), a 784-64-10 fully connected neural network (NN), and a LeNet-like Convolutional Neural Network (CNN) with ReLU activation layers and 36 classes (10 for the digits and 26 for the case-insensitive letters). The training processes were done in Python scripts, and the resulting kernels and weights were stored in hex files and loaded into the FPGA's SRAM units. Convolution, pooling, data management, and various other ML features were guided by firmware in our custom assembly language. This paper documents the high-level design block diagrams, interfaces between each System Verilog module, implementation details of our software and firmware components, and further discussions on potential impacts.

updated: Mon Jun 26 2023 02:54:29 GMT+0000 (UTC)

published: Fri Jun 23 2023 15:31:09 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト