Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Fault Injection

English | 한국어

English

RustOS fault injection is a controlled way to make common OS boundaries fail on purpose. It is for hardening recovery paths, not for random breakage.

The current path is:

  1. Rules live in config/rustos.toml under [fault_injection].
  2. cargo xtask run, debug, and probe-display pass the rules to QEMU through fw_cfg as opt/rustos/fault-injection.
  3. The kernel reads that fw_cfg file during boot after the heap is initialized.
  4. Selected kernel boundaries call should_fail("fault.point").
  5. If a rule fires, that boundary returns the same kind of failure it would return for a real device, storage, allocation, or IPC problem.

Configuration

[fault_injection]
enabled = true
rules = ["display.present=off"]

Rule format:

fault.point=action

Actions:

ActionMeaning
offRegister the point but do not fail. Good default.
failFail every call.
drop-every:NFail every Nth call.
fail-after:NLet N calls pass, then fail later calls.
rate:NFail about N out of 1000 calls.
delay-ms:NParsed for future use; delay injection is not wired yet.

Keep the rules = [...] array on one physical line for now. The current logging cfg generator also scans config/rustos.toml, and standalone multiline array closing brackets are not accepted by that parser yet.

Fault Points

PointSimulated failure
alloc.framePhysical frame allocation returns None.
block.readBlock device read returns DeviceFault.
block.writeBlock device write returns DeviceFault.
display.presentDisplay present is dropped.
display.provider.registerDriver framebuffer provider registration fails.
driver.module.loadLoadable driver module load fails.
input.event.enqueuePointer/input event is dropped before enqueue.
pci.config.readLinux compat PCI config read returns an I/O-style error.
process.spawnUser process spawn fails as if no task slot was available.
socket.recvSocket receive returns a retryable error.
socket.sendSocket send returns a retryable error.
virtio-gpu.control.submitVirtIO GPU control command submission fails.

Examples

Drop every tenth display present:

[fault_injection]
enabled = true
rules = ["display.present=drop-every:10"]

Fail storage reads after early boot has made some progress:

[fault_injection]
enabled = true
rules = ["block.read=fail-after:50"]

Inject a low-rate IPC/socket send failure:

[fault_injection]
enabled = true
rules = ["socket.send=rate:5"]

You can also pass a rule for one run without editing the config:

cargo xtask run --fault display.present=drop-every:10 --timeout 35 --summarize-log -- --no-reboot

Adding New Points

Add fault points at real failure boundaries, not inside arbitrary helper functions. Good places are allocator, block I/O, device registration, queue submit, process spawn, socket/IPC send and receive, and driver probe or load boundaries.

Use the existing shared parser in libs/rustos-fault-injection and the kernel runtime in kernel/nucleus-core/src/util/fault_injection.rs. Do not invent a one-off config format for a single subsystem.

한국어

RustOS fault injection은 OS의 중요한 경계가 일부러 실패한 것처럼 만드는 하드닝 장치입니다. 목적은 무작위로 망가뜨리는 것이 아니라, 실제 장애가 났을 때 복구 경로가 제대로 동작하는지 보는 것입니다.

현재 흐름은 이렇습니다.

  1. 규칙은 config/rustos.toml[fault_injection]에 둡니다.
  2. cargo xtask run, debug, probe-display가 규칙을 QEMU fw_cfg의 opt/rustos/fault-injection으로 넘깁니다.
  3. 커널은 heap 초기화 직후 부팅 중 fw_cfg 파일을 읽습니다.
  4. 선택된 커널 경계가 should_fail("fault.point")를 호출합니다.
  5. 규칙이 발동하면 실제 device, storage, allocation, IPC 문제가 난 것처럼 해당 경계가 실패를 반환합니다.

설정

[fault_injection]
enabled = true
rules = ["display.present=off"]

규칙 형식:

fault.point=action

지원 action:

Action의미
off지점은 등록하지만 실패시키지 않음. 기본값으로 적합합니다.
fail매번 실패
drop-every:NN번째 호출마다 실패
fail-after:NN번은 통과시키고 이후 호출 실패
rate:N1000번 중 대략 N번 실패
delay-ms:N현재는 파싱만 됩니다. 실제 delay 주입은 아직 연결되지 않았습니다.

현재는 rules = [...] 배열을 한 줄에 유지하세요. 기존 logging cfg generator가 config/rustos.toml 전체를 같이 스캔하기 때문에, 독립된 줄의 multiline 배열 닫는 대괄호를 아직 받아들이지 못합니다.

Fault Point

Point흉내 내는 실패
alloc.frame물리 frame allocation이 None 반환
block.readblock device read가 DeviceFault 반환
block.writeblock device write가 DeviceFault 반환
display.presentdisplay present drop
display.provider.registerdriver framebuffer provider 등록 실패
driver.module.loadloadable driver module load 실패
input.event.enqueuepointer/input event enqueue 전 drop
pci.config.readLinux compat PCI config read가 I/O성 오류 반환
process.spawntask slot 부족처럼 user process spawn 실패
socket.recvsocket receive가 retry 가능한 오류 반환
socket.sendsocket send가 retry 가능한 오류 반환
virtio-gpu.control.submitVirtIO GPU control command 제출 실패

예시

화면 present를 10번마다 한 번 drop:

[fault_injection]
enabled = true
rules = ["display.present=drop-every:10"]

초기 부팅 이후 storage read 실패:

[fault_injection]
enabled = true
rules = ["block.read=fail-after:50"]

낮은 확률의 IPC/socket send 실패:

[fault_injection]
enabled = true
rules = ["socket.send=rate:5"]

config를 수정하지 않고 한 번만 rule을 넘길 수도 있습니다.

cargo xtask run --fault display.present=drop-every:10 --timeout 35 --summarize-log -- --no-reboot

새 지점 추가

fault point는 아무 helper에나 넣지 말고 실제 실패 경계에 넣으세요. 좋은 위치는 allocator, block I/O, device registration, queue submit, process spawn, socket/IPC send/receive, driver probe/load 경계입니다.

규칙 파싱은 libs/rustos-fault-injection, 커널 런타임은 kernel/nucleus-core/src/util/fault_injection.rs를 사용하세요. 특정 subsystem 전용 임시 config 형식을 만들지 않습니다.