# RV64I Base Integer Instruction Set

This is step 2 of the book Writing a RISC-V Emulator from Scratch in 10 Steps, whose goal is running xv6, a small Unix-like OS, in your emulator in the final step.

The source code is available at d0iasm/rvemu-for-book/step02/.

In the end of this page, we can execute the sample file that calculates a Fibonacci number in our emulator. We will support RV64 ISAs, the base integer instruction set a 64-bit architecture, to calculate a Fibonacci number.

Sample binary files are also available at d0iasm/rvemu-for-book/step02/. We successfully see the result of the 10th Fibonacci number when we execute the sample binary file `fib.bin`.

`// fib.c contains the following C code and fib.bin is the build result of it:// int fib(int n);// int main() {//   return fib(10); // Calculate the 10th fibonacci number.// }// int fib(int n) {//   if (n == 0 || n == 1)//     return n;//   else//     return (fib(n-1) + fib(n-2));// }​\$ cargo run fib.bin...           x12=0x0 x13=0x0 x14=0x1 x15=0x37 // x15 should contain 55 (= 10th fibonacci number).`

# RV64I: Base Integer Instruction Set

RV64I is a base integer instruction set for the 64-bit architecture, which builds upon the RV32I variant. RV64I shares most of the instructions with RV32I but the width of registers is different and there are a few additional instructions only in RV64I.

In this step, we're going to implement 47 instructions (35 instructions from RV32I and 12 instructions from RV64I). We've already implemented `add` and `addi` so we'll skip them. Also, we'll skip implementing `fence`, `ecall`, and `ebreak` for now. I'll cover `ecall` and `ebreak` in the following step and won't explain `fence`. The `fence` instruction is a type of barrier instruction to apply an ordering constraint on memory operations issued before and after it. We don't need it since our emulator is a single core system and doesn't reorder memory operations (out-of-order execution).

Fig 2.1 and Fig 2.2 are the lists for RV32I and RV64I, respectively. We're going to implement all instructions in the figures. Fig 2.1 RV32I Base Instruction Set (Source: RV32I Base Instruction Set table in Volume I: Unprivileged ISA) Fig 2.2 RV64I Base Instruction Set (Source: RV64I Base Instruction Set table in Volume I: Unprivileged ISA)

# CPU Module

First, we're going to divide the implementation of the CPU from the `main.rs` file. Rust provides a module system to split code into logical units and organize visibility. We're going to move the implementation of CPU to a new file `cpu.rs`.

In order to define a cpu module we need to `mod` keyword at the beginning of the main file. Also `use` keyword allows us to use public items in the cpu module.

`src/main.rs// This declaration will look for a file named `cpu.rs` or `cpu/mod.rs` and will// insert its contents inside a module named `cpu` under this scope.mod cpu;​// Use all public structures, methods, and functions defined in the cpu module.use crate::cpu::*;`
`src/cpu.rs// `pub` keyword allows other modules use the `Cpu` structure and methods// relating to it. pub struct Cpu {    ...}​impl Cpu {    ...}`

## Fetch-decode-execute Cycle

The step 1 already mentioned the fetch-decode-execute cycle and we're going to implement it in the `main.rs`. An emulator is ideally an infinite loop and executes program infinitely unless something wrong happens or a user stops an emulator explicitly. However, we're going to stop an emulator implicitly when the program counter is 0 or over the length of memory, and an error happens during the execution.

`src/main.rsfn main() -> io::Result<()> {    ...    while cpu.pc < cpu.memory.len() as u64 {        // 1. Fetch        let inst = cpu.fetch();                // 2. Add 4 to the program counter.        cpu.pc += 4;​        // 3. Decode.        // 4. Execute.        match cpu.execute(inst) {            // True if an error happens.            true => break,            false => {}        };​        // This is a workaround for avoiding an infinite loop.        if cpu.pc == 0 {            break;        }    }    ...}`

## Fetch Stage

The fetch stage is basically the same as the previous step, but I prefer to create a new function to read 32-bit data from a memory because there are other instructions to read and write memory.

`src/cpu.rsimpl Cpu {    ...      fn read32(&self, addr: u64) -> u64 {        let index = addr as usize;        return (self.memory[index] as u64)            | ((self.memory[index + 1] as u64) << 8)            | ((self.memory[index + 2] as u64) << 16)            | ((self.memory[index + 3] as u64) << 24);    }    ...        pub fn fetch(&self) -> u32 {        return self.read32(self.pc) as u32;    }    ...}`

## Decode Stage

The decode stage is almost the same as the previous step too and we'll add 2 new fields `funct3` and `funct7`. `funct3` is located from 12 to 14 bits and `funct7` is located from 25 to 31 bits as we can see in Fig 2.1 and 2.2. These fields and opcode select the type of operation.

`src/cpu.rsimpl Cpu {    ...     fn execute(&mut self, inst: u32) {        ...        let funct3 = (inst & 0x00007000) >> 12;        let funct7 = (inst & 0xfe000000) >> 25;        ...`

In RISC-V, there are many common positions in all formats, but decoding an immediate value is quite different depending on instructions, so we'll decode an immediate value in each operation.

For example, the immediate value in branch instructions is located in the place of `rd` and `funct7`. A branch instruction is a `if` statement in C to change the sequence of instruction execution depending on a condition, which includes `beq`, `bne`, `blt`, `bge`, `bltu`, and `bgeu`.

Decoding is performed by bitwise ANDs and bit shifts. The point to be noted is that an immediate value should be sign-extended. It means we need to fill in the upper bits with 1 when the significant bit is 1. In this implementation, filling in bits with 1 is performed by casting from a signed integer to an unsigned integer.

The way how to decode each instruction is listed in Fig 2.1 and Fig 2.2.

`src/cpu.rsimpl Cpu {    ...     fn execute(&mut self, inst: u32) {        ...        match opcode {            0x63 => {                // imm[12|10:5|4:1|11] = inst[31|30:25|11:8|7]                let imm = (((inst & 0x80000000) as i32 as i64 >> 19) as u64)                    | ((inst & 0x80) << 4)   // imm                    | ((inst >> 20) & 0x7e0) // imm[10:5]                    | ((inst >> 7) & 0x1e);  // imm[4:1]                                    match funct3 {                    ...`

## Execute Stage

Each operation is performed in each `match` arm. For example, a branch instruction `beq`, which is one of the branch instructions, is executed when `opcode` is 0x63 and `funct3` is 0x0. `beq` sets the `pc` to the current `pc` plus the signed-extended offset if a value in `rs1` equals a value in `rs2`. The current `pc` means the position when CPU fetched an instruction from memory so we need to subtract 4 from `pc` because we added 4 after fetch.

`src/cpu.rsimpl Cpu {    ...     fn execute(&mut self, inst: u32) {        ...        match opcode {            0x63 => {                let imm = ...;​                match funct3 {                    0x0 => {                        // beq                        if self.regs[rs1] == self.regs[rs2] {                            self.pc = self.pc.wrapping_add(imm).wrapping_sub(4);                        }                    }                ...`

# Memory Module

// TODO: write this section

# System Bus Module

// TODO: write this section

# Instructions List

The following table is a brief explanation for each instruction. The book won't describe the details of each instruction but will indicate points to be noted when you implement instructions. In addition, you can see the implementation in d0iasm/rvemu-for-book/step02/src/cpu.rs and description in Chapter 2 RV32I Base Integer Instruction Set and Chapter 5 RV64I Base Integer Instruction Set in the unprivileged specification.

Points to be noted:

• Arithmetic operations are done by wrapping_* functions to avoid an overflow.

• Sign-extended is done by casting from a smaller signed integer to a larger signed integer.

• The amount for 64-bit shift operations is encoded in the lower 6 bits in an immediate, and the amount for 32-bit shift operations is encoded in the lower 5 bits.

# Testing

We're going to test instructions we implemented in this step by calculating a Fibonacci number and check if the registers are expected values. I prepared a sample binary file available at d0iasm/rvemu-for-book/step02/. Download the fib.bin file and execute it in your emulator.

Calculating a Fibonacci number is actually not enough to test all RV64I instructions, so it perhaps be better to use riscv/riscv-tests to make sure if your implementation is correct. However, it's not obvious how to use riscv-tests so I'll skip to use the test in this book for the sake of simplicity. If you are interested in using riscv-tests, the test file in rvemu may be helpful.

`// fib.c contains the following C code and fib.bin is the build result of it:// int fib(int n);// int main() {//   return fib(10); // Calculate the 10th fibonacci number.// }// int fib(int n) {//   if (n == 0 || n == 1)//     return n;//   else//     return (fib(n-1) + fib(n-2));// }​\$ cargo run fib.bin...           x12=0x0 x13=0x0 x14=0x1 x15=0x37 // x15 should contain 55 (= 10th fibonacci number).`

## How to Build Test Binary

If you want to execute a bare-metal C program you write, you need to make an ELF binary without any headers because our emulator just starts to execute at the address `0x0` . The Makefile helps you build a test binary.

`\$ riscv64-unknown-elf-gcc -S fib.c\$ riscv64-unknown-elf-gcc -Wl,-Ttext=0x0 -nostdlib -o fib fib.s\$ riscv64-unknown-elf-objcopy -O binary fib fib.bin`