Pull to refresh

Verilog Meetups @ Hacker Dojo: the status and the plans for 2024

Reading time9 min
Views2.3K

The first meetups of the Portable SystemVerilog Examples group at Hacker Dojo in Mountain View, California were a kind of brainstorming sessions. We discussed the electronic industry, the essence of modern chip design, and the challenges of educating new design engineers. Then we moved to a new mode of action. We started weekly meetings of the core R&D team with the goal to prepare educational materials for the events for a larger audience. The meetings are generally held on Sundays from 11 pm to 2 pm. If you cannot come to Mountain View, you can join online.

The Focus of the Team

The focus of the RTL team is to create open-source microarchitecture examples for future professional chip designers. The examples can be used to prepare for job interviews, train students in universities or support other open-source projects. The examples should be compatible with both commercial and open-source ASIC and FPGA tools and avoid digging into vendor-specific details. We isolate the user from the specifics of Xilinx, Altera, Gowin or Lattice by using automation scripts and board-specific wrappers.

We are not trying to show people that FPGAs are easy (there are other people who are doing it). We assume that people who come to us not only know some coding (and don't mind parameterized modules from the first exercise) but are also not afraid of solving somewhat counter-intuitive design problems with parallelism, pipelines, flow control, etc.

We may have some additional projects like a simplified set of examples to educate high school students or even to connect FPGA exercises with the exercises on a breadboard with small-scale integration chips, but those are side branches of our main activity.

During the weekly meetings of the core team, we are going to discuss everybody's progress, do code reviews, and plan for the next series of small projects. Sometimes we are going to have a one-hour lecture or a presentation session on the details of the SystemVerilog standard, the simulation semantics, the design methodology, a new FPGA platform or a new tool we want to support.

Self Education to Guarantee the Quality

To ensure everybody's quality, we expect every member who is new to the field, to go through the following steps in his education. Those who already know the basics can review the materials to ensure their quality:

  1. Do all the homework from https://github.com/yuri-panchul/systemverilog-homework.

  2. Run and review all the examples in https://github.com/yuri-panchul/basics-graphics-music

  3. Get a tutorial in Bash if you have not done this before. We use Bash scripts to isolate users from vendor-specific toolchains.

  4. Get a tutorial in Git if you have not done this before. We use Git and GitHub Forks / Pull requests as the vehicle of collaboration inside the team and to publish the examples.

  5. Get a tutorial in Markdown markup language as we use it for documentation.

  6. Study a diagram editor for microarchitecture diagrams such as Lucidspark or draw.io.

  7. Learn how to use wavedrom.com for microarchitecture diagrams.

  8. Learn how to use edaplayground.com useful to create verification examples with full-featured SystemVerilog.

The Recommended Literature

In parallel with going through the problems and examples, we recommend studying the following books and online articles:

  1. Digital Design and Computer Architecture, RISC-V Edition by Sarah Harris and David Harris.

  2. Logic Design and Verification Using SystemVerilog – March 1, 2016 by Donald Thomas. Not to be confused with the old (starting 1980s) books of the same author.

  3. Digital Design: A Systems Approach by William James Dally and R. Curtis Harting.

  4. IEEE 1800-2017 - Standard for SystemVerilog should be used as a reference.

  5. Verilog Gotcha articles by Stuart Sutherland, Don Mills and Chris Spear: Part 1 and Part 2.

  6. The articles of Cliff Cummings.

    Once we get to verification examples:

  7. SystemVerilog for Verification: A Guide to Learning the Testbench Language Features 3rd ed. 2012 Edition by Chris Spear and Greg Tumbush.

  8. Writing Testbenches using SystemVerilog 2006th Edition by Janick Bergeron.

  9. Getting Started with UVM: A Beginner's Guide - May 22, 2013 by Vanessa R. Cooper. This book does an excellent job of setting up the minimum necessary skeleton for the UVM testbench. However, the BFM driver code in this book is not good for pipelined and out-of-order protocols. There are no good books on BFMs for pipelined and out-of-order protocols - we have examples of how to do this right here, but we need to convert this example into UVM using Vanessa R. Cooper's UVM skeleton example.

    Once we get to CPU-specific examples (like branch predictors, caches and Tomasulo), we recommend:

  10. Modern Processor Design: Fundamentals of Superscalar Processors by John Paul Shen and Mikko H. Lipasti.

  11. Modern System-on-Chip Design by David J. Greaves.

The list of tasks

Each member should look at some project to do. Even better is two to four different projects: documentation, board support, a peripheral and microarchitecture.

Here is the current list of tasks:

  1. Documentation: We need to write a step-by-step instruction on how to install the toolchain for Altera and work with the examples basics-graphics-music. This is a useful task for a beginner who wants to solidify his knowledge of how to start and help other beginners. Allocated.

  2. Documentation: The same for Xilinx.

  3. Documentation: The same for Gowin with the commercial toolchain. There is an instruction written by Aleksandr Ryabov but it is not edited yet. One participant agreed to take a look. Allocated.

  4. Documentation: The same for the Gowin + Yosys toolchain.

  5. Documentation: The same for Lattice + Yosys toolchain.

  6. Documentation: The same for Lattice + commercial toolchain. Overall we need a person who can implement the complete support for the Lattice commercial toolchain, not just documentation.

  7. Board support: Altera: bettershengsun, a currently unsupported Cyclone IV board available AliExpress - see basics-music-graphics/misc/000_todo/bettershengsun.

  8. Board support: Altera: MAX10 StepFPGA, a small breadboardable board.

  9. Board support: Altera: QMTech Starter Kit with Cyclone IV. Need to debug - the current implementation is not working.

  10. Board support: Xilinx: Digilent Cmod A7 and Digilent Cmod S7 / PLTW S7 breadboardable boards. Allocated.

  11. Board support: Gowin: Tang Nano 20K and 9K. Improve pin assignment. Allocated.

  12. Board support: Gowin: Tang Primer 25K. A new board. Allocated.

  13. Board support: Gowin: Runber.

  14. Board support: Gowin: Tang Nano 1K Minimalist.

  15. Board support: Gowin: Tang Nano 4K.

  16. Board support: Lattice: We have 7 Lattice boards and only one is supported; plus there is support for two more using only the Yosys-based toolchain. We need a lead for the Lattice board support in California, able to work with both the commercial toolchain from Lattice and the open-source Yosys-based toolchain.

  17. Peripherals: Improve VGA graphics examples by parameterizing the number of bits per color channel for all the boards available. Some boards have just 1 bits per color, others have 8 bits per color, which allows fine colors. Priority.

  18. Peripherals: Create a good example of USB-to-UART that allows to connect any board to a computer via UART. The example should work under both Linux and Windows.

  19. Peripherals: Create an example with a rotary encoder that can be used for an exam.

  20. Peripherals: Create an example with an ultrasound distance measurer that can be used for an exam.

  21. Peripherals: Create an example of using IR remote.

  22. Peripherals: HDMI support.

  23. Peripherals: SPI joystick, I2C humidity sensor etc.

  24. Toolchain support / Scripting: Add the scripts to support the Lattice commercial toolchain.

  25. Toolchain support / SystemVerilog: Make all examples in basics-graphics-music and valid-ready-etc (repository under construction) to be compatible with OpenLane ASIC open-source toolchain. Strategically important for the seminars uniting FPGA and ASIC design.

  26. Toolchain support / Open Lane: Create example-specific Tcl files based on basics-music-graphics/scripts/asic/config.tcl for the microarchitectural and CPU examples. This would allow us to analyze and compare the consequences of design decisions on QoR (Quality of Results: clock frequency and area).

  27. Scripting / Board comparison: Make a script that runs synthesis of all examples with different boards and extracts data about FPGA utilization from the logs into a .CSV file that can be viewed in any spreadsheet program in a user-friendly readable way.

  28. Scripting / Infrastructure: Modify the scripts to make them compatible with Windows Subsystem for Linux (WSL), not just Bash from Git for Windows package. Priority.

  29. Microarchitecture / SystemVerilog Homework Part 5: add valid/ready and backpressure to the pipelined isqrt example from SystemVerilog Homework Part 4. Three variants: global stall, skid buffer and half-performance pipeline register. High priority.

  30. Microarchitecture / SystemVerilog Homework Part 5: add FIFO after the pipeline and credit-based flow control to the isqrt example from SystemVerilog Homework Part 4. High priority.

  31. Microarchitecture / SystemVerilog Homework Part 5: round-robin arbiter for 2 (example), 3 (TODO), 4 (example), 8 (TODO) requestors. See Arbiters: Design Ideas and Coding Styles by Matt Weber. High priority.

  32. Microarchitecture / valid-ready-etc: combine round-robin arbiter and FIFOs for a demo on an FPGA board.

  33. Microarchitecture / valid-ready-etc: multi-grant arbiter with a demo on an FPGA board. Google the article "Generating fast logic circuits for m-select n-port Round Robin Arbitration" or ask me for a PDF.

  34. Microarchitecture / valid-ready-etc: credit-based flow control for pow5 pipeline example on an FPGA board.

  35. Microarchitecture: FIFO that uses pseudo-dual port SRAM with latency N, bypass and prefetching memory read data.

  36. Microarchitecture: FIFO that uses single port SRAM with latency N, bypass and prefetching memory read data. Compare and contrast with a FIFO that uses pseudo-dual port SRAM.

  37. Microarchitecture: FIFO with multiple push and multiple pop operations in one cycle.

  38. Microarchitecture: Memory controller with in-order read responses and variable memory access latency. ReOrder Buffer (ROB).

  39. Microarchitecture: A possible homework problem: Using two single-port memories to implement a ping-pong buffering. See Doubling Memory Bandwidth for Network Buffers by Youngmi Joo and Nick McKeown, Stanford.

  40. Microarchitecture: A possible homework or example problem based on multiport memory research by LaForest and others: 1, 2, 3.

  41. Microarchitecture: A linked list controller in hardware. The dynamic structures can be stored in memory. An interview question in several companies.

  42. Microarchitecture: A series of homework problems and FPGA board demos related to AXI-Stream: Removing empty bytes, upsizing, downsizing, shifting the stream by B bytes left or right, converting the interleaved sequences with different IDs into the sequences without interleaving, combining the data from two AXI streams into one, with or without data shift inside a single transfer.

  43. Board support / Memory / Microarchitecture: FIFO that uses SDRAM available on many FPGA boards. Need to implement a memory controller in order to do this.

  44. CPU: Adopt schoolRISCV into the basics-graphics-music / ready-valid-etc infrastructure.

  45. CPU: Add a problem to solve: modify schoolRISCV by adding a multiplication instruction support using a combinational multiplier in Verilog. Change the CPU instruction decoder, ALU, test, and testbench to verify the result. Demonstrate the difference in maximum clock frequency using Open Lane ASIC synthesis and FPGA synthesis tools.

  46. CPU: Add a problem to solve: modify schoolRISCV by adding a multiplication instruction support using a pipelined multiplier with the latency of two clock cycles. Two variants: using a stall and using an optimized pipelined CPU implementation.

  47. CPU: Add a problem to solve: modify schoolRISCV to support an instruction memory with zero, one, or two-cycle latency.

  48. CPU: Adopt MIRISCV core into the basics-graphics-music / ready-valid-etc infrastructure.

  49. CPU: Create an example of three MIRISC cores sharing the same memory using simple arbitration. Then make this memory a multi-bank memory. Demonstrate bank conflicts and the performance gain on different memory access patterns.

  50. CPU: Create an example of two MIRISC cores exchanging information with each other using FIFOs connected to memory-mapped registers (gated storage). We can discuss this mechanism in some sessions.

  51. CPU: Connect a cache from the appendix to the Patterson-Hennessy textbook (5th Edition) to the MIRISC core and demonstrate the performance changes with different memory access patterns.

  52. CPU: Adopt YRV-Plus RISC-V core, described in Inside an Open-Source Processor – July 1, 2021 by Monte Dalrymple, into the basics-graphics-music / ready-valid-etc infrastructure. Prepare a presentation that compares and contrasts this core against another Monte Dalrymple's core that follows the microarchitecture before the RISC revolution and is described in Microprocessor Design Using Verilog HDL Paperback – 2017 by Monte Dalrymple.

  53. CPU: Research and possibly put into basics-graphics-music / ready-valid-etc infrastructure a CPU called CORE-V Wally. It is associated with a textbook and is promoted by David Harris.

  54. CPU, long-term: Branch prediction examples.

  55. CPU, long-term: Cache coherency protocol examples: MSI, MESI, MOESI, snooping-based, directory-based.

  56. Non-CPU Custom Computing Machines: Create an example to compute and display a fractal (it can be Mandelbrot, but not necessarily), using multiple computing FSMs working in parallel, sending the results to a frame buffer.

  57. Computer Arithmetics: A library of modules for half-precision floating point arithmetic.

  58. Clock Domain Crossing: A series of examples with FPFA boards based on Cliff Cummings articles: 1, 2, 3. See the beginning here.

  59. Verification: Convert the educational AXI IP into a UVM example using the UVM skeleton from Vanessa R. Cooper's UVM book as an example.

  60. The beginner's materials: Test the package of simplified examples for the high-school students: repo1, repo2, release, a post.

Tags:
Hubs:
Total votes 3: ↑3 and ↓0+3
Comments3

Articles