Making a micro Linux distro (2023)

Follow @popovicu94

# Building a Tiny (Micro) Linux Distribution from Scratch

In this article, we’ll explore how to build a tiny (micro) Linux “distribution” from scratch. While this distribution won’t have many functions, it will be built entirely from the ground up. We will compile the Linux kernel ourselves and write some software to package our micro-distro.

For this example, we will focus on the RISC-V architecture, specifically QEMU’s `riscv64` virt machine. However, there’s very little about this process that is architecture-specific, so you can perform a very similar exercise on other architectures like x86.

Recently, we covered the RISC-V boot process with SBI (Supervisor Binary Interface) and bare metal programming for RISC-V, so this is just a continuation up the software stack.

### A Quick Warning

This article presents a very simplified view of a Linux distribution. Some details may not be 100% accurate—think of it as 99.9% accurate. It’s aimed at beginners to help form a basic mental framework for understanding Linux systems. More advanced users might find some parts overly simplified.

## Table of Contents

– What is an OS kernel?
– What is a Linux distribution?
– How does “infrastructure on top of infrastructure” run?
– The init process (and its “children”)
– Building our almost useless Linux micro distribution
– Building a Linux operating system for RISC-V
– First obstacles
– Building the initramfs
– So what is an operating system?
– Bonus section: making an actually useful micro distribution with u-root

## Building Our Mini Linux Micro Distribution

Let’s update our `file_list.txt` to include the following essential files:

“`
init
little_shell
“`

Then pack it all up again using:

“`bash
cpio -o -H newc < file_list.txt > initramfs.cpio
“`

### Running the Micro Distribution in QEMU

Run the following command:

“`bash
qemu-system-riscv64 -machine virt -kernel arch/riscv/boot/Image -initrd /PATH/TO/NEWLY_BUILT/initramfs.cpio
“`

You should see output similar to this:

“`
[ 0.356314] Run /init as init process
Hello from the original init! 1
Hello world from Go!
Enter your command: [[[mkdir hello]]]
Your command is: mkdir hello
Enter your command: [[[ls]]]
Your command is: ls
Enter your command:
Hello from the original init! 2
[[[echo 123]]]
Your command is: echo 123
Enter your command: [[[exit]]]
Your command is: exit
Enter your command:
Hello from the original init! 3
[[[I give up!]]]
Your command is: I give up!
“`

Note: The bits enclosed in triple square brackets (`[[[ ]]]`) are user inputs sent over UART.

### Understanding the Output Behavior

In this console excerpt, we see these three things interleaved on the UART interface:

– The original init’s periodic output every 10 seconds.
– Output from the `little_shell`.
– Input from the user.

All output is mixed because both the `init` process and `little_shell` print to the same standard output, which in Linux is treated as an open file. When `little_shell` was forked from `init`, it inherited the open file descriptors, including standard input and output, so they share these I/O streams.

Even with multiple I/O devices, outputs can be sent over the same stream, as in our case with UART. The Linux kernel abstracts away the hardware details such as UART, exposing them as file handles instead.

### What Did We Achieve?

We now have a very minimal but homemade Linux distribution! You can share it with friends and even try expanding it:

– Implement a mini shell that actually understands commands like `mkdir`.
– Incorporate process forking to execute commands properly.

The possibilities are endless—you’re in the Linux userspace!

## Did the Linux Kernel Keep Its Promises?

Let’s reflect if the Linux kernel delivered on its core promises in our setup:

– **Hardware Abstraction:** Neither our `init` nor `shell` knew anything directly about UART hardware. They wrote to Linux file handles, abstracted by the kernel. The kernel invoked the UART driver, potentially using SBI underneath.

– **High-Level Programming Paradigms:** The kernel introduced filesystems as an abstraction.

– **Process Isolation:** After forking the shell from `init`, they ran independently with isolated memory. File handles were shared by inheritance, but processes can be explicitly configured for shared memory if desired.

There’s much more the kernel provides, but even this minimal setup shows how it offers a portable infrastructure for high-level software development.

## So, What *Is* an Operating System?

This question is often debated and can be a game of semantics.

– Some consider the *kernel* itself to be the operating system.
– Others refer to the entire Linux *distribution* as the operating system.
– Still others might have entirely different definitions.

The important takeaway is understanding the distinction between kernel space and user space, what Linux kernel offers, and what runs on top. Hopefully, you now have a clearer mental picture of these responsibilities and boundaries.

# Bonus Section: Making a Actually Useful Micro Distribution with u-root

Why stop at a useless shell? Let’s boot into something truly useful—a user space environment where you can run commands like `ls`, `mkdir`, `echo`, and more.

I highly recommend the [u-root](https://github.com/u-root/u-root) project for this purpose.

### What is u-root?

Although its title mentions “Go bootloaders,” u-root’s bootloaders are not bare metal but run **on top of a live Linux kernel**. They utilize Linux’s `kexec` to reload different kernels from user space.

We won’t use the bootloader functionality here; instead, we’ll focus on using u-root to generate a complete user space environment.

### Using u-root

1. **Install u-root** following their instructions, ending up with a `u-root` binary in your PATH.

2. **Clone the repository and generate an initramfs image:**

“`bash
git clone https://github.com/u-root/u-root.git
cd u-root
GOOS=linux GOARCH=riscv64 u-root
“`

The output will end with something like:

“`
18:31:31 Successfully built “/tmp/initramfs.linux_riscv64.cpio” (size 14827284).
“`

3. **Run QEMU with the new initramfs:**

“`bash
qemu-system-riscv64 -machine virt -kernel arch/riscv/boot/Image -initrd /tmp/initramfs.linux_riscv64.cpio
“`

### Exploring u-root’s Shell Environment

On boot, you’ll see:

“`
[ 0.400269] Run /init as init process
2023/09/12 01:34:33 Welcome to u-root!

/# ls
bbin bin buildbin dev env etc go init lib lib64 proc root sys tcz tmp ubin usr var
/# pwd
/
/# echo “Hello world!”
Hello world!
“`

The u-root shell supports tab-completion and many standard Unix commands with typical flags.

### Adding Network Support and Visiting Google.com

To enable networking in QEMU and u-root:

1. **Add network devices via QEMU CLI:**

“`bash
-device virtio-net-device,netdev=usernet -netdev user,id=usernet,hostfwd=tcp::10000-:22 -device virtio-rng-pci
“`

2. **Ensure kernel config includes:**

– `CONFIG_VIRTIO_PCI=y`
– `CONFIG_HW_RANDOM_VIRTIO=y`
– `CONFIG_CRYPTO_DEV_VIRTIO=y`

3. **Rebuild your kernel if needed.**

4. **Start QEMU with the network devices attached:**

“`bash
qemu-system-riscv64 -machine virt -kernel arch/riscv/boot/Image -initrd /tmp/initramfs.linux_riscv64.cpio \
-device virtio-net-device,netdev=usernet -netdev user,id=usernet,hostfwd=tcp::10000-:22 -device virtio-rng-pci
“`

5. **In u-root’s shell, check IP addresses:**

“`bash
/# ip addr
“`

If `eth0` is down, bring it up by running:

“`bash
/# dhclient -ipv6=false
“`

6. **Verify network is configured:**

You should see DHCP lease information, including an assigned IP address.

### Accessing google.com via wget

Since ping may not work in this virtualized QEMU environment, use `wget` to fetch google’s homepage:

“`bash
/# wget http://google.com
“`

You can then read the downloaded `index.html`:

“`bash
/# cat index.html
“`

Expect to see obfuscated JavaScript—proof that the request succeeded!

## Package Managers in Linux Distributions

By now, you likely understand that package managers are critical software gateways on a Linux distribution. They enable adding, updating, or removing software without rebuilding entire images.

Our approach here was an embedded style: building monolithic images for embedded devices that get replaced entirely upon update. This doesn’t suit desktops or phones. Package managers fill that role.

We won’t discuss package managers in detail now but keep them in mind as you explore further.

## The Monster of init

The simple `init` we created merely launched a shell, but `init` is a core system component that sets up many essential subsystems—such as device nodes in `/dev`.

For example, in your u-root environment, run:

“`bash
/# ls /dev
“`

to see numerous devices initialized by `init`.

Different Linux distributions have various sophisticated `init` systems (e.g., Systemd, SysVinit). Designing an `init` system well is a science in itself.

You can dive into u-root’s source code to see how its `init` works.

For more information and resources, feel free to explore the [u-root GitHub repository](https://github.com/u-root/u-root).

I hope this journey through building and understanding a micro Linux distribution sparked your imagination and gave you a clearer understanding of Linux internals. Happy hacking!
https://popovicu.com/posts/making-a-micro-linux-distro/

Leave a Reply

Your email address will not be published. Required fields are marked *