Monday, July 3, 2017

FltCreateFile and top device.

FltCreateFile calls IoCreateFileEx with IO_DRIVER_CREATE_CONTEXT.DeviceObjectHint pointing to the Filter Manager's filter object and then calls the lower registered filters. That allows the created file object to have TopDeviceObjectHint pointing to the Filter Manager's object.
 # Child-SP          RetAddr           Call Site
00 ffffe101`3fffebf8 fffff801`92125200 FLTMGR!FltpCreate
01 ffffe101`3fffec00 fffff801`9213058b nt!IopParseDevice+0x7f0
02 ffffe101`3fffedd0 fffff801`921340c0 nt!ObpLookupObjectName+0x46b
03 ffffe101`3fffefa0 fffff801`9213803a nt!ObOpenObjectByNameEx+0x1e0
04 ffffe101`3ffff0e0 fffff801`920b0eb4 nt!IopCreateFile+0x3aa
05 ffffe101`3ffff180 fffff808`485240d5 nt!IoCreateFileEx+0x124
06 ffffe101`3ffff210 fffff808`4853d32d FLTMGR!FltpCreateFile+0x1cd
07 ffffe101`3ffff310 fffff808`4b6a79f8 FLTMGR!FltCreateFile+0x8d
08 ffffe101`3ffff3a0 fffff808`484f4b4c avscan!AvPreCreate+0x378 [d:\work\avscan\filter\avscan.c @ 2106]
09 ffffe101`3ffff4b0 fffff808`484f46ec FLTMGR!FltpPerformPreCallbacks+0x2ec
0a ffffe101`3ffff5d0 fffff808`48526117 FLTMGR!FltpPassThroughInternal+0x8c
0b ffffe101`3ffff600 fffff801`92125200 FLTMGR!FltpCreate+0x2d7
0c ffffe101`3ffff6b0 fffff801`9213058b nt!IopParseDevice+0x7f0
0d ffffe101`3ffff880 fffff801`921340c0 nt!ObpLookupObjectName+0x46b
0e ffffe101`3ffffa50 fffff801`920c9e90 nt!ObOpenObjectByNameEx+0x1e0

0: kd> dt nt!_FILE_OBJECT ffff948c621a3330
   +0x000 Type             : 0n5
   +0x002 Size             : 0n216
   +0x008 DeviceObject     : 0xffff948c`60a3bc80 _DEVICE_OBJECT
   +0x010 Vpb              : 0xffff948c`60a556e0 _VPB
   +0x018 FsContext        : 0xffff948c`6111a740 Void
   +0x020 FsContext2       : 0xffff8483`76cf78f0 Void
   +0x028 SectionObjectPointer : (null) 
   +0x030 PrivateCacheMap  : (null) 
   +0x038 FinalStatus      : 0n0
   +0x040 RelatedFileObject : (null) 
   +0x048 LockOperation    : 0 ''
   +0x049 DeletePending    : 0 ''
   +0x04a ReadAccess       : 0x1 ''
   +0x04b WriteAccess      : 0 ''
   +0x04c DeleteAccess     : 0 ''
   +0x04d SharedRead       : 0x1 ''
   +0x04e SharedWrite      : 0x1 ''
   +0x04f SharedDelete     : 0x1 ''
   +0x050 Flags            : 0x40000
   +0x058 FileName         : _UNICODE_STRING "\"
   +0x068 CurrentByteOffset : _LARGE_INTEGER 0x0
   +0x070 Waiters          : 0
   +0x074 Busy             : 0
   +0x078 LastLock         : (null) 
   +0x080 Lock             : _KEVENT
   +0x098 Event            : _KEVENT
   +0x0b0 CompletionContext : (null) 
   +0x0b8 IrpListLock      : 0
   +0x0c0 IrpList          : _LIST_ENTRY [ 0xffff948c`621a33f0 - 0xffff948c`621a33f0 ]
   +0x0d0 FileObjectExtension : 0xffff948c`6226f1b0 Void
   
0: kd> dq 0xffff948c`6226f1b0
ffff948c`6226f1b0  00000000`00000000 00000000`00000000
ffff948c`6226f1c0  ffff948c`60dff0d0 00000000`00000000
ffff948c`6226f1d0  ffff948c`6243f2c0 00000000`00000000
ffff948c`6226f1e0  00000000`00000000 00000000`00000000
ffff948c`6226f1f0  00000000`00000000 00000000`00000000
ffff948c`6226f200  61436d4d`02120006 00000000`0000034c
ffff948c`6226f210  ffff8483`7535ed10 ffff948c`62161e28
ffff948c`6226f220  ffff948c`6221da78 00000000`00000000

0: kd> dq ffff948c`60dff0d0
ffff948c`60dff0d0  ffff948c`610734a0 00000000`00000000
ffff948c`60dff0e0  00000000`00000000 00000000`00000000
ffff948c`60dff0f0  65536d4d`02060003 6c8da38a`069a7123
ffff948c`60dff100  00000000`00000000 0000024e`49c8000a
ffff948c`60dff110  0000024e`49c80fff 00000000`00000000
ffff948c`60dff120  00000000`00000000 00000000`00000000
ffff948c`60dff130  00000000`00000000 00000000`00000000
ffff948c`60dff140  00000000`00000002 00000000`00000000

0: kd> !object ffff948c`610734a0
Object: ffff948c610734a0  Type: (ffff948c5e34eb00) Device
    ObjectHeader: ffff948c61073470 (new version)
    HandleCount: 0  PointerCount: 1
    
0: kd> !devstack ffff948c610734a0
  !DevObj           !DrvObj            !DevExt           ObjectName
> ffff948c610734a0  \FileSystem\FltMgr ffff948c610735f0  
  ffff948c61048060  \FileSystem\fastfatffff948c610481b0 
  
0: kd> !vpb 0xffff948c`60a556e0
Vpb at 0xffff948c60a556e0
Flags: 0x1 mounted 
DeviceObject: 0xffff948c61048060
RealDevice:   0xffff948c60a3bc80
RefCount: 8
Volume Label: 

Tuesday, June 27, 2017

Sunday, June 25, 2017

Magenta RISC-V booting

Below is an output of Magenta RISC-V port riscv-magenta

[1266874889.709] 00000.00000> Available physical memory: 2032dMB
[1266874889.709] 00000.00000> 
welcome to lk/MP

[1266874889.709] 00000.00000> INIT: cpu 0, calling hook 0xffffffff8006e2d4 (global_prng_seed) at level 0x30000, flags 0x1
[1266874889.709] 00000.00000> WARNING: System has insufficient randomness.  It is completely unsafe to use this system for any cryptographic applications.
[1266874889.709] 00000.00000> INIT: cpu 0, calling hook 0xffffffff800386b0 (elf_build_id) at level 0x3fffe, flags 0x1
[1266874889.709] 00000.00000> INIT: cpu 0, calling hook 0xffffffff800385f0 (version) at level 0x3ffff, flags 0x1
[1266874889.709] 00000.00000> version:
[1266874889.709] 00000.00000>  arch:     RISCV
[1266874889.709] 00000.00000>  platform: RISCV_RV64
[1266874889.709] 00000.00000>  target:   QEMU_RISCV_RV64
[1266874889.709] 00000.00000>  project:  MAGENTA_QEMU_RISCV_RV64
[1266874889.709] 00000.00000>  buildid:  GIT_0FBB50D3A8FC1D242E1D7A2921674579C9192D66_DIRTY_LOCAL
[1266874889.709] 00000.00000>  ELF build ID: be2909891fc4cce74084f5fa3f14d6e98903d3bb
[1266874889.709] 00000.00000> INIT: cpu 0, calling hook 0xffffffff80042f0c (vm_preheap) at level 0x3ffff, flags 0x1
[1266874889.709] 00000.00000> initializing heap
[1266874889.709] 00000.00000> INIT: cpu 0, calling hook 0xffffffff8004305c (vm) at level 0x50000, flags 0x1
[1266874889.709] 00000.00000> VM: reserving kernel region [ffffffff80000000, ffffffff800f7000) flags 0x28 name 'kernel_code'
[1266874889.709] 00000.00000> VM: reserving kernel region [ffffffff800f7000, ffffffff80139000) flags 0x8 name 'kernel_rodata'
[1266874889.709] 00000.00000> VM: reserving kernel region [ffffffff80139000, ffffffff8013c000) flags 0x18 name 'kernel_data'
[1266874889.709] 00000.00000> VM: reserving kernel region [ffffffff8013c000, ffffffff80166000) flags 0x18 name 'kernel_bss'
[1266874889.709] 00000.00000> VM: reserving kernel region [ffffffff8016a000, ffffffff8114b000) flags 0x18 name 'kernel_bootalloc'
[1266874889.709] 00000.00000> INIT: cpu 0, calling hook 0xffffffff8000197c (timer) at level 0x50003, flags 0x1
[00003.966] 00000.00000> initializing mp
[00003.966] 00000.00000> initializing threads
[00003.967] 00000.00000> initializing timers
[00003.967] 00000.00000> INIT: cpu 0, calling hook 0xffffffff800132d4 (debuglog) at level 0x6ffff, flags 0x1
[00003.970] 00000.00000> INIT: cpu 0, calling hook 0xffffffff8006e448 (global_prng_thread_safe) at level 0x6ffff, flags 0x1
[00003.971] 00000.00000> creating bootstrap completion thread
[00004.057] 00000.00000> top of bootstrap2()
[00004.058] 00000.00000> INIT: cpu 0, calling hook 0xffffffff80070c9c (dpc) at level 0x70000, flags 0x1
[00004.064] 00000.00000> INIT: cpu 0, calling hook 0xffffffff8009158c (magenta) at level 0x70000, flags 0x1
[00004.078] 00000.00000> initializing platform
[00004.078] 00000.00000> initializing target
[00004.078] 00000.00000> calling apps_init()
[00004.078] 00000.00000> INIT: cpu 0, calling hook 0xffffffff80014464 (ktrace) at level 0xaffff, flags 0x1
[00005.091] 00000.00000> ktrace: buffer at 0xffffffc000e01000 (33554432 bytes)
[00005.094] 00000.00000> INIT: cpu 0, calling hook 0xffffffff800384bc (userboot) at level 0xaffff, flags 0x1
[00005.161] 00000.00000> userboot: userboot rodata       0 @ [0x292f043000,0x292f045000)
[00005.163] 00000.00000> userboot: userboot code    0x2000 @ [0x292f045000,0x292f05a000)
[00005.165] 00000.00000> userboot: vdso/full rodata       0 @ [0x292f05a000,0x292f05f000)
[00005.167] 00000.00000> userboot: vdso/full code    0x5000 @ [0x292f05f000,0x292f061000)
[00005.193] 00000.00000> userboot: entry point             @ 0x292f046da0
[00005.293] 00000.00000> starting app shell
] [00005.296] 00000.00000> entering main console loop

Monday, June 19, 2017

RISC-V Magenta context switch


The code related to RISC-V Magenta context switching can be found in exception.S  and switch_to.h files.
A thread for a CPU is switched by a call to
static inline void riscv_switch_to(struct riscv_thread_state * prev,
                                   struct riscv_thread_state * next)
{
    __switch_to_aux(prev, next);
    __switch_to(prev, next);
    assert(get_current());
}
The registers are saved and restored from the riscv_thread_state structure.
/*
 * Integer register context switch
 * The callee-saved registers must be saved and restored.
 * 
 *   a0: previous task_struct (must be preserved across the switch)
 *   a1: next task_struct
 */
.section .text
FUNCTION(__switch_to)
    /*
    * $a0 == &prev->arch.stat
    * $a1 == &next->arch.stat
    */
    /* Save context into prev->arch.state */
    REG_S ra,  THREAD_RA(a0)
    REG_S sp,  THREAD_SP(a0)
    REG_S s0,  THREAD_S0(a0)
    REG_S s1,  THREAD_S1(a0)
    REG_S s2,  THREAD_S2(a0)
    REG_S s3,  THREAD_S3(a0)
    REG_S s4,  THREAD_S4(a0)
    REG_S s5,  THREAD_S5(a0)
    REG_S s6,  THREAD_S6(a0)
    REG_S s7,  THREAD_S7(a0)
    REG_S s8,  THREAD_S8(a0)
    REG_S s9,  THREAD_S9(a0)
    REG_S s10, THREAD_S10(a0)
    REG_S s11, THREAD_S11(a0)
    /* Restore context from next->arch.state */
    REG_L ra,  THREAD_RA(a1)
    REG_L sp,  THREAD_SP(a1)
    REG_L s0,  THREAD_S0(a1)
    REG_L s1,  THREAD_S1(a1)
    REG_L s2,  THREAD_S2(a1)
    REG_L s3,  THREAD_S3(a1)
    REG_L s4,  THREAD_S4(a1)
    REG_L s5,  THREAD_S5(a1)
    REG_L s6,  THREAD_S6(a1)
    REG_L s7,  THREAD_S7(a1)
    REG_L s8,  THREAD_S8(a1)
    REG_L s9,  THREAD_S9(a1)
    REG_L s10, THREAD_S10(a1)
    REG_L s11, THREAD_S11(a1)
    REG_L tp,  THREAD_TI(a1) /* Next thread_info pointer */
    /*return to $ra, the new $sp has been set*/
    ret
END(__switch_to)
The new thread $ra points to the next instruction after a call to __switch_to inside riscv_switch_to
(gdb) p/x newthread->arch.state
$137 = {ra = 0xffffffff80002f3c, sp = 0xffffffff8114cc80, s = {0xffffffff8114ccb0, 0xffffffff800031c4, 0x1, 0xffffffff800347f8, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, fstate = {f = {
      0x0 <repeats 32 times>}, fcsr = 0x0}, ti = 0xffffffff8114ae80}
The ret instruction at the end of __switch_to resumes a thread at this address.
The scheduled out thread state has $ra pointing to same address.
(gdb) p/x oldthread->arch.state
$138 = {ra = 0xffffffff8000b3f8, sp = 0xffffffc002e04000, s = {0x0 <repeats 12 times>}, fstate = {f = {0x0 <repeats 32 times>}, fcsr = 0x0}, ti = 0xffffffff8116a9d8}
The riscv_switch_to disassembled listing is
(gdb) disassem 0xffffffff80002f3c
Dump of assembler code for function riscv_switch_to:
   0xffffffff80002f04 <+0>: addi sp,sp,-48
   0xffffffff80002f08 <+4>: sd ra,40(sp)
   0xffffffff80002f0c <+8>: sd s0,32(sp)
   0xffffffff80002f10 <+12>: sd s1,24(sp)
   0xffffffff80002f14 <+16>: addi s0,sp,48
   0xffffffff80002f18 <+20>: mv s1,ra
   0xffffffff80002f1c <+24>: sd a0,-40(s0)
   0xffffffff80002f20 <+28>: sd a1,-48(s0)
   0xffffffff80002f24 <+32>: ld a1,-48(s0)
   0xffffffff80002f28 <+36>: ld a0,-40(s0)
   0xffffffff80002f2c <+40>: jal ra,0xffffffff80002ee0 <__switch_to_aux>
   0xffffffff80002f30 <+44>: ld a1,-48(s0)
   0xffffffff80002f34 <+48>: ld a0,-40(s0)
   0xffffffff80002f38 <+52>: jal ra,0xffffffff800003a8 <__switch_to>
   0xffffffff80002f3c <+56>: jal ra,0xffffffff80002e70 <get_current>
   0xffffffff80002f40 <+60>: mv a5,a0
   0xffffffff80002f44 <+64>: seqz a5,a5
   0xffffffff80002f48 <+68>: andi a5,a5,255
   0xffffffff80002f4c <+72>: beqz a5,0xffffffff80002f78 <riscv_switch_to+116>
   0xffffffff80002f50 <+76>: mv a0,s1
   0xffffffff80002f54 <+80>: mv a1,s0
   0xffffffff80002f58 <+84>: lui a5,0x800f2
   0xffffffff80002f5c <+88>: addi a5,a5,1880 # 0xffffffff800f2758
   0xffffffff80002f60 <+92>: li a4,34
   0xffffffff80002f64 <+96>: lui a3,0x800f2
   0xffffffff80002f68 <+100>: addi a3,a3,1896 # 0xffffffff800f2768
   0xffffffff80002f6c <+104>: lui a2,0x800f2
   0xffffffff80002f70 <+108>: addi a2,a2,1840 # 0xffffffff800f2730
   0xffffffff80002f74 <+112>: jal ra,0xffffffff8006b9f0 <_panic>
   0xffffffff80002f78 <+116>: nop
   0xffffffff80002f7c <+120>: ld ra,40(sp)
   0xffffffff80002f80 <+124>: ld s0,32(sp)
   0xffffffff80002f84 <+128>: ld s1,24(sp)
   0xffffffff80002f88 <+132>: addi sp,sp,48
   0xffffffff80002f8c <+136>: ret
As you can see 0xffffffff80002f3c is an address of a call to get_current which is a parameter to the assert check after a call to __switch_to.

Monday, June 12, 2017

RISC-V Magenta. The init process.

The init process address space initialisation.
#2  0xffffffff80004690 in arch_mmu_init_aspace (aspace=0xffffffff81168310, base=16777216, size=274861125632, flags=0) at kernel/arch/riscv/mmu.cpp:73
#3  0xffffffff8004bb64 in VmAspace::Init (this=0xffffffff811681c0) at kernel/kernel/vm/vm_aspace.cpp:142
#4  0xffffffff8004bdb0 in VmAspace::Create (flags=0, name=0x0) at kernel/kernel/vm/vm_aspace.cpp:180
#5  0xffffffff8008f7e0 in ProcessDispatcher::Initialize (this=0xffffffff81167eb0) at kernel/lib/magenta/process_dispatcher.cpp:147
#6  0xffffffff8008ef30 in ProcessDispatcher::Create (job=..., name=..., flags=0, dispatcher=0xffffffff81146e48, rights=0xffffffff81146e3c, root_vmar_disp=0xffffffff81146e40, 
    root_vmar_rights=0xffffffff81146e38) at kernel/lib/magenta/process_dispatcher.cpp:73
#7  0xffffffff80032b84 in attempt_userboot () at kernel/lib/userboot/userboot.cpp:283
#8  0xffffffff800330ec in userboot_init (level=720895) at kernel/lib/userboot/userboot.cpp:357
#9  0xffffffff80005da8 in lk_init_level (required_flag=LK_INIT_FLAG_PRIMARY_CPU, start_level=655360, stop_level=720895) at kernel/top/init.c:86
#10 0xffffffff80005e4c in lk_primary_cpu_init_level (start_level=655360, stop_level=720895) at kernel/include/lk/init.h:51
#11 0xffffffff800060fc in bootstrap2 (arg=0x0) at kernel/top/main.c:136
#12 0xffffffff8000a78c in initial_thread_func () at kernel/kernel/thread.c:84
#13 0xffffffff8000a74c in init_thread_struct (t=0xffffffff81144be0, name=0x0) at kernel/kernel/thread.c:72

Thursday, June 8, 2017

Windows. Cache prefetching


00 nt!KiSwapContext
01 nt!KiSwapThread
02 nt!KiCommitThreadWait
03 nt!KeWaitForSingleObject
04 nt!MiWaitForInPageComplete
05 nt!MiPfCompleteInPageSupport
06 nt!MiPfCompletePrefetchIos
07 nt!MmWaitForCacheManagerPrefetch
08 nt!CcFetchDataForRead
09 nt!CcMapAndCopyFromCache
0a nt!CcCopyReadEx
0b nt!CcCopyRead

Tuesday, May 30, 2017

RISC-V GNU tool chain and relocation sections in shared and static libraries.

The library code
const
unsigned char __clz_tab[] =
{
  0,1,2,2,3,3,3,3,4,4,4,4,4,4,4,4,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,
  6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,
  7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
  7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
  8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,
  8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,
  8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,
  8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,
};

long foo(long x)
{
    return __clz_tab[x % (sizeof(__clz_tab)/sizeof(__clz_tab[0]))];
}
Build the shared(.so) and static(.o) library.
slava@slava-P34V2:/work/risc-v/musl-riscv/tmp$ riscv64-unknown-linux-gnu-gcc  -shared -ffreestanding -nostdlib -fPIC  -o relocation.so relocation.c
slava@slava-P34V2:/work/risc-v/musl-riscv/tmp$ riscv64-unknown-linux-gnu-gcc  -c -ffreestanding -nostdlib -fPIC  -o relocation.o relocation.c
The shared library has a .got (Global Offset Table) section to reference the __clz_tab array. The .got section is adjusted by the loader with the relocation data from the .rela.dyn section.
slava@slava-P34V2:/work/risc-v/musl-riscv/tmp$ riscv64-unknown-linux-gnu-objdump -hr relocation.so

relocation.so:     file format elf64-littleriscv

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .hash         00000030  00000000000000e8  00000000000000e8  000000e8  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  1 .dynsym       000000a8  0000000000000118  0000000000000118  00000118  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  2 .dynstr       00000027  00000000000001c0  00000000000001c0  000001c0  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  3 .rela.dyn     00000018  00000000000001e8  00000000000001e8  000001e8  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  4 .text         00000038  0000000000000200  0000000000000200  00000200  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  5 .rodata       00000100  0000000000000238  0000000000000238  00000238  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  6 .dynamic      000000e0  0000000000001338  0000000000001338  00000338  2**3
                  CONTENTS, ALLOC, LOAD, DATA
  7 .got          00000020  0000000000001418  0000000000001418  00000418  2**3
                  CONTENTS, ALLOC, LOAD, DATA
  8 .comment      00000011  0000000000000000  0000000000000000  00000438  2**0
                  CONTENTS, READONLY
                  
slava@slava-P34V2:/work/risc-v/musl-riscv/tmp$ riscv64-unknown-linux-gnu-objdump --disassemble relocation.so

relocation.so:     file format elf64-littleriscv


Disassembly of section .text:

0000000000000200 <foo>:
 200: fe010113           addi sp,sp,-32
 204: 00813c23           sd s0,24(sp)
 208: 02010413           addi s0,sp,32
 20c: fea43423           sd a0,-24(s0)
 210: fe843783           ld a5,-24(s0)
 214: 0ff7f793           andi a5,a5,255
 218: 00001717           auipc a4,0x1
 21c: 21873703           ld a4,536(a4) # 1430 <_GLOBAL_OFFSET_TABLE_+0x8>
 220: 00f707b3           add a5,a4,a5
 224: 0007c783           lbu a5,0(a5)
 228: 00078513           mv a0,a5
 22c: 01813403           ld s0,24(sp)
 230: 02010113           addi sp,sp,32
 234: 00008067           ret
 
slava@slava-P34V2:/work/risc-v/musl-riscv/tmp$ riscv64-unknown-linux-gnu-readelf -r  relocation.so

Relocation section '.rela.dyn' at offset 0x1e8 contains 1 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000001430  000300000002 R_RISCV_64        0000000000000238 __clz_tab + 0
slava@slava-P34V2:/work/risc-v/musl-riscv/tmp$ riscv64-unknown-linux-gnu-objdump -hr relocation.o

relocation.o:     file format elf64-littleriscv

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .text         00000038  0000000000000000  0000000000000000  00000040  2**2
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
  1 .data         00000000  0000000000000000  0000000000000000  00000078  2**0
                  CONTENTS, ALLOC, LOAD, DATA
  2 .bss          00000000  0000000000000000  0000000000000000  00000078  2**0
                  ALLOC
  3 .rodata       00000100  0000000000000000  0000000000000000  00000078  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  4 .comment      00000012  0000000000000000  0000000000000000  00000178  2**0
                  CONTENTS, READONLY
RELOCATION RECORDS FOR [.text]:
OFFSET           TYPE              VALUE 
0000000000000018 R_RISCV_GOT_HI20  __clz_tab
000000000000001c R_RISCV_PCREL_LO12_I  .L0 
The static library also has a .got (Global Offset Table) section to reference the __clz_tab array. The .got section is adjusted by the loader with the relocation data from the .rela.dyn section.
slava@slava-P34V2:/work/risc-v/musl-riscv/tmp$ riscv64-unknown-linux-gnu-objdump -hr relocation.o

relocation.o:     file format elf64-littleriscv

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .text         00000038  0000000000000000  0000000000000000  00000040  2**2
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
  1 .data         00000000  0000000000000000  0000000000000000  00000078  2**0
                  CONTENTS, ALLOC, LOAD, DATA
  2 .bss          00000000  0000000000000000  0000000000000000  00000078  2**0
                  ALLOC
  3 .rodata       00000100  0000000000000000  0000000000000000  00000078  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  4 .comment      00000012  0000000000000000  0000000000000000  00000178  2**0
                  CONTENTS, READONLY
RELOCATION RECORDS FOR [.text]:
OFFSET           TYPE              VALUE 
0000000000000018 R_RISCV_GOT_HI20  __clz_tab
000000000000001c R_RISCV_PCREL_LO12_I  .L0 

slava@slava-P34V2:/work/risc-v/musl-riscv/tmp$ riscv64-unknown-linux-gnu-objdump --disassemble relocation.o

relocation.o:     file format elf64-littleriscv


Disassembly of section .text:

0000000000000000 <foo>:
   0: fe010113           addi sp,sp,-32
   4: 00813c23           sd s0,24(sp)
   8: 02010413           addi s0,sp,32
   c: fea43423           sd a0,-24(s0)
  10: fe843783           ld a5,-24(s0)
  14: 0ff7f793           andi a5,a5,255

0000000000000018 <.L0 >:
  18: 00000717           auipc a4,0x0
  1c: 00073703           ld a4,0(a4) # 18 <.L0 >
  20: 00f707b3           add a5,a4,a5
  24: 0007c783           lbu a5,0(a5)
  28: 00078513           mv a0,a5
  2c: 01813403           ld s0,24(sp)
  30: 02010113           addi sp,sp,32
  34: 00008067           ret
  
slava@slava-P34V2:/work/risc-v/musl-riscv/tmp$ riscv64-unknown-linux-gnu-readelf -r  relocation.o

Relocation section '.rela.text' at offset 0x2a8 contains 2 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000000018  000800000014 R_RISCV_GOT_HI20  0000000000000000 __clz_tab + 0
00000000001c  000600000018 R_RISCV_PCREL_LO1 0000000000000018 .L0  + 0

Friday, May 19, 2017

Magenta RISC-V is running in kernel mode with threads.

Great news!

Magenta RISC-V port ( riscv-magenta ) reached a milestone. The kernel is running and switching between kernel threads.

Tuesday, May 16, 2017

RISC-V Magenta kernel manual stack unwinding

A manual stack unwinding when handle_exception sets a call frame
(gdb) bt
#0  0xffffffff8002e0c4 in _panic (caller=0xffffffff80002554 <do_trap_insn_misaligned+52>, frame=0xffffffff80040de8 <init_thread_union+7656>, fmt=0xffffffff800343c8 "%s unimplemented\n")
    at kernel/lib/debug/debug.c:32
#1  0xffffffff80002490 in do_trap_error (regs=0xffffffff80040e08 <init_thread_union+7688>, signo=7, code=1, addr=18446744071562076884, str=0xffffffff80034420 "Oops - instruction address misaligned")
    at kernel/arch/riscv/traps.c:52
#2  0xffffffff80002554 in do_trap_insn_misaligned (regs=0xffffffff80040e08 <init_thread_union+7688>) at kernel/arch/riscv/traps.c:67
#3  0xffffffff8000027c in handle_exception () at kernel/arch/riscv/rv64/exception.S:221
Backtrace stopped: frame did not save the PC
A frame stack has the following layout
struct frame{
    coid* caller_s0; // $s0
    void* caller_pc; // $ra
}
the current frame pointer is saved in the s0 register.
(gdb) p/x $s0
$1 = 0xffffffff80040d88
(gdb) x/2xg 0xffffffff80040d88-16
0xffffffff80040d78 <init_thread_union+7544>: 0xffffffff80040de8 0xffffffff80002490
(gdb) x/2xg 0xffffffff80040de8-16
0xffffffff80040dd8 <init_thread_union+7640>: 0xffffffff80040e08 0xffffffff80002554
(gdb) x/2xg 0xffffffff80040e08-16
0xffffffff80040df8 <init_thread_union+7672>: 0xffffffff80040f30 0xffffffff8000027c
(gdb) x/2xg 0xffffffff80040f30-16
0xffffffff80040f20 <init_thread_union+7968>: 0xffffffff80040f70 0xffffffff800022d4
(gdb) x/2xg 0xffffffff80040f70-16
0xffffffff80040f60 <init_thread_union+8032>: 0xffffffff80040fb0 0xffffffff80008f20
(gdb) x/2xg 0xffffffff80040fb0-16
0xffffffff80040fa0 <init_thread_union+8096>: 0xffffffff80040fe0 0xffffffff80008ff8
(gdb) x/5i 0xffffffff8000027c
   0xffffffff8000027c <handle_exception+352>: ld s1,256(sp)
   0xffffffff80000280 <handle_exception+356>: csrci sstatus,2
   0xffffffff80000284 <handle_exception+360>: andi s1,s1,256
   0xffffffff80000288 <handle_exception+364>: bnez s1,0xffffffff80000294 <handle_exception+376>
   0xffffffff8000028c <handle_exception+368>: addi s1,sp,280
(gdb) x/5i 0xffffffff800022d4
   0xffffffff800022d4 <arch_thread_construct_first+36>: sw a5,0(a4)
   0xffffffff800022d8 <arch_thread_construct_first+40>: jal ra,0xffffffff800021e4 <current_thread_info>
   0xffffffff800022dc <arch_thread_construct_first+44>: sd a0,-40(s0)
   0xffffffff800022e0 <arch_thread_construct_first+48>: ld a5,-40(s0)
   0xffffffff800022e4 <arch_thread_construct_first+52>: ld a4,-56(s0)
(gdb) x/5i 0xffffffff80008f20
   0xffffffff80008f20 <thread_construct_first+188>: addi a5,s0,-48
   0xffffffff80008f24 <thread_construct_first+192>: li a2,0
   0xffffffff80008f28 <thread_construct_first+196>: mv a1,a5
   0xffffffff80008f2c <thread_construct_first+200>: lui a5,0x80045
   0xffffffff80008f30 <thread_construct_first+204>: addi a0,a5,-1232
(gdb) x/5i 0xffffffff80008ff8
   0xffffffff80008ff8 <thread_init_early+112>: jal ra,0xffffffff80007678 <sched_init_early>
   0xffffffff80008ffc <thread_init_early+116>: nop
   0xffffffff80009000 <thread_init_early+120>: ld ra,40(sp)
   0xffffffff80009004 <thread_init_early+124>: ld s0,32(sp)
   0xffffffff80009008 <thread_init_early+128>: ld s1,24(sp)
GDB can not unwind after handle_exception as it is unable to verify that the handle_exception frame is valid, the handle_exception function has been written on assembler with a prolog that restores sp from a scratch register instead of a frame initalization. I added a frame pointer saving for handle_exception after the stack pointer restoration ( https://github.com/slavaim/riscv-magenta/blob/riscv/kernel/arch/riscv/rv64/exception.S ).
    /*a call frame to facilitate with debugging*/
    .macro SET_GDB_FRAME
    addi    sp, sp, -2*SZREG /* allocate the frame */
    REG_S   s0, 0(sp)        /* get the frame pointer at the exception moment */
    csrr    s0, sepc         /* get the exception PC */
    REG_S   s0, SZREG(sp)    /* set the exception PC as $ra for the frame */
    addi    s0, sp, 2*SZREG  /* set s0 to the current frame pointer */
    .endm

    .macro DEL_GDB_FRAME
    REG_L   s0, 0(sp)        /* restore the caller frame pointer */
    addi    sp, sp, 2*SZREG  /* restore the stack pointer */
    .endm
From the frame before the exception handler we see that the arch_thread_construct_first raised the exception. We need to examine this function prologue to get the offset to the bootom of the stack from the frame address.
(gdb) x/10i arch_thread_construct_first
   0xffffffff800022b0 <arch_thread_construct_first>: addi sp,sp,-64
   0xffffffff800022b4 <arch_thread_construct_first+4>: sd ra,56(sp)
   0xffffffff800022b8 <arch_thread_construct_first+8>: sd s0,48(sp)
   0xffffffff800022bc <arch_thread_construct_first+12>: sd s1,40(sp)
   0xffffffff800022c0 <arch_thread_construct_first+16>: addi s0,sp,64
   0xffffffff800022c4 <arch_thread_construct_first+20>: mv s1,ra
   0xffffffff800022c8 <arch_thread_construct_first+24>: sd a0,-56(s0)
   0xffffffff800022cc <arch_thread_construct_first+28>: li a4,0
   0xffffffff800022d0 <arch_thread_construct_first+32>: li a5,1
=> 0xffffffff800022d4 <arch_thread_construct_first+36>: sw a5,0(a4)
Now the $pc, $sp and $s0 (a frame pointer ) registers can be set to unwind the stack before handle_exception was called by a CPU. 64 bytes was subtracted from the s0 register value to get the sp register value according to the function prologue displayed above.
(gdb) set $pc=0xffffffff800022d4
(gdb) set $sp=0xffffffff80040f70-64
(gdb) set $s0=0xffffffff80040f70
(gdb) bt
#0  0xffffffff800022d4 in arch_thread_construct_first (t=0xffffffff800426d8 <idle_threads>) at kernel/arch/riscv/thread.c:34
#1  0xffffffff80008f20 in thread_construct_first (t=0xffffffff800426d8 <idle_threads>, name=0xffffffff80035368 "bootstrap") at kernel/kernel/thread.c:1016
#2  0xffffffff80008ff8 in thread_init_early () at kernel/kernel/thread.c:1037
#3  0xffffffff80003ec4 in lk_main () at kernel/top/main.c:53
#4  0xffffffff8000146c in _riscv_start () at kernel/arch/riscv/rv64/start.S:42
Backtrace stopped: frame did not save the PC

Monday, April 3, 2017

TLB flushing call on Windows

nt!KiRetireDpcList+0xd7
nt!KxRetireDpcList+0x5 (TrapFrame @ fffff800`cc332e70)
nt!KiDispatchInterruptContinue
nt!KiDpcInterrupt+0xca (TrapFrame @ ffffd000`a9b34d90)
nt!MiFlushTbList+0x20c
nt!MiDeleteSystemPagableVm+0x4d9
nt!MiPurgeSpecialPoolPaged+0x18
nt!MmFreeSpecialPool+0x3cf
nt!ExDeferredFreePool+0x677
nt!VerifierExFreePoolWithTag+0x44

Tuesday, March 7, 2017

RISC-V Linux memory regions on boot

This text is based on memory_areas_on_boot.md from my GitHub repo  riscv-notes

On boot the kernel has the following memory areas required for code execution
  • vmlinux ELF code and data sections mapped by the bootloader
  • the page tables for virtual memory support created by the bootloader
  • initial stack
The pages used by the above regions must be marked as reserved so they are not used for memory allocations.
As shown here https://github.com/slavaim/riscv-notes/blob/master/linux/memory-initialization.md the kernel makes the following calls for memory reservation.
    memblock_reserve(info.base, __pa(_end) - info.base);
    reserve_boot_page_table(pfn_to_virt(csr_read(sptbr)));
The first call to memblock_reserve is to reserve the area from &_start to &_end , this area is defined in the following linker script.
SECTIONS
{
    /* Beginning of code and text segment */
    . = LOAD_OFFSET;
    _start = .;
    __init_begin = .;
    HEAD_TEXT_SECTION
    INIT_TEXT_SECTION(PAGE_SIZE)
    INIT_DATA_SECTION(16)
    /* we have to discard exit text and such at runtime, not link time */
    .exit.text :
    {
        EXIT_TEXT
    }
    .exit.data :
    {
        EXIT_DATA
    }
    PERCPU_SECTION(L1_CACHE_BYTES)
    __init_end = .;

    .text : {
        _text = .;
        _stext = .;
        TEXT_TEXT
        SCHED_TEXT
        LOCK_TEXT
        KPROBES_TEXT
        ENTRY_TEXT
        IRQENTRY_TEXT
        *(.fixup)
        _etext = .;
    }

    /* Start of data section */
    _sdata = .;
    RO_DATA_SECTION(PAGE_SIZE)
    RW_DATA_SECTION(0x40, PAGE_SIZE, THREAD_SIZE)
    .sdata : {
        _gp = . + 0x800;
        *(.sdata*)
    }
    .srodata : {
        *(.srodata*)
    }
    /* End of data section */
    _edata = .;

    BSS_SECTION(0x20, 0, 0x20)

    EXCEPTION_TABLE(0x10)
    NOTES

    .rel.dyn : {
        *(.rel.dyn*)
    }

    _end = .;

    STABS_DEBUG
    DWARF_DEBUG

    DISCARDS
}
As you can see this area encompasses all kernel code and data excluding debug information. This area starts at ffffffff80000000. You can easily find the start and end addresses from the System.map file. These values for my test kernel 
ffffffff80000000 T _start
ffffffff803b10b4 R _end
The second call to reserve_boot_page_table reserves the initial page table pages.
Where is a stack reservation? The stack is reserved by the first call to memblock_reserve as the initial stack is allocated from the kernel data section. The initial stack is staically allocated as init_thread_union.stack . The init_thread_union has the following type definition in linux/linux-4.6.2/include/linux/sched.h
union thread_union {
    struct thread_info thread_info;
    unsigned long stack[THREAD_SIZE/sizeof(long)];
};
For my test kernel the address of the init_thread_union is again extracted from System.map as
ffffffff8035e000 D init_thread_union
As you can see it is in the range of the region [&_start,&_end) and is in the data section.
The stack register is set on boot by the kernel entry routine _start defined in linux/linux-4.6.2/arch/riscv/kernel/head.S
__INIT
ENTRY(_start)
...
    /* Initialize stack pointer */
    la sp, init_thread_union + THREAD_SIZE
    /* Initialize current task_struct pointer */
    la tp, init_task
 ...
 END(_start)

RISC-V Linux kernel memory initialization on boot.

This text is based on memory-initialization.md from my GitHub repo  riscv-notes
The kernel is started with virtual memory initialized by machine level bootloader BBL. The more detailed description can be found in this document - supervisor_vm_init.md .
The kernel start offset is defined in linux/linux-4.6.2/arch/riscv/include/asm/page.h
/*
 * PAGE_OFFSET -- the first address of the first page of memory.
 * When not using MMU this corresponds to the first free page in
 * physical memory (aligned on a page boundary).
 */
#ifdef CONFIG_64BIT
#define PAGE_OFFSET     _AC(0xffffffff80000000,UL)
#else
#define PAGE_OFFSET     _AC(0xc0000000,UL)
#endif
BBL initializes virtual memory for supervisor mode, maps the Linux kernel at PAGE_OFFSET, sets sptbr register value to a root page table physical address, switches to the supervisor mode with $pc set to the entry point _start. BBL does this in enter_supervisor_mode function defined in riscv-tools/riscv-pk/machine/minit.c
void enter_supervisor_mode(void (*fn)(uintptr_t), uintptr_t stack)
{
  uintptr_t mstatus = read_csr(mstatus);
  mstatus = INSERT_FIELD(mstatus, MSTATUS_MPP, PRV_S);
  mstatus = INSERT_FIELD(mstatus, MSTATUS_MPIE, 0);
  write_csr(mstatus, mstatus);
  write_csr(mscratch, MACHINE_STACK_TOP() - MENTRY_FRAME_SIZE);
  write_csr(mepc, fn);
  write_csr(sptbr, (uintptr_t)root_page_table >> RISCV_PGSHIFT);
  asm volatile ("mv a0, %0; mv sp, %0; mret" : : "r" (stack));
  __builtin_unreachable();
}
The important difference between RISC-V case and many other CPUs( e.g. x86 )is that Linux kernel's entry point is called with virtual memory initialized by boot loader executing at higher privilege mode.
The memory management is initialized inside setup_arch routine defined in linux/linux-4.6.2/arch/riscv/kernel/setup.c, below only memory management relevant part of the function is shown
void __init setup_arch(char **cmdline_p)
{
...
    init_mm.start_code = (unsigned long) _stext;
    init_mm.end_code   = (unsigned long) _etext;
    init_mm.end_data   = (unsigned long) _edata;
    init_mm.brk        = (unsigned long) _end;

    setup_bootmem();
    ....
    paging_init();
    ....
}
The _stext, _etext, _edata, _end global variables are defined in the linker script linux/linux-4.6.2/arch/riscv/kernel/vmlinux.lds.S which defines the kernel memory layout. These variables defines the kernel section borders. The thorough description regarding linkers scripts can be found here https://sourceware.org/binutils/docs/ld/Scripts.html .
The first function being called is setup_bootmem
static void __init setup_bootmem(void)
{
    unsigned long ret;
    memory_block_info info;

    ret = sbi_query_memory(0, &info);
    BUG_ON(ret != 0);
    BUG_ON((info.base & ~PMD_MASK) != 0);
    BUG_ON((info.size & ~PMD_MASK) != 0);
    pr_info("Available physical memory: %ldMB\n", info.size >> 20);

    /* The kernel image is mapped at VA=PAGE_OFFSET and PA=info.base */
    va_pa_offset = PAGE_OFFSET - info.base;
    pfn_base = PFN_DOWN(info.base);

    if ((mem_size != 0) && (mem_size < info.size)) {
        memblock_enforce_memory_limit(mem_size);
        info.size = mem_size;
        pr_notice("Physical memory usage limited to %lluMB\n",
            (unsigned long long)(mem_size >> 20));
    }
    set_max_mapnr(PFN_DOWN(info.size));
    max_low_pfn = PFN_DOWN(info.base + info.size);

#ifdef CONFIG_BLK_DEV_INITRD
    setup_initrd();
#endif /* CONFIG_BLK_DEV_INITRD */

    memblock_reserve(info.base, __pa(_end) - info.base);
    reserve_boot_page_table(pfn_to_virt(csr_read(sptbr)));
    memblock_allow_resize();
}
The Linux kernel queries the available memory size in setup_bootmem by invoking SBI interface's sbi_query_memorywhich results in a call to __sbi_query_memory BBL routine executed (suprisingly) in supervisor mode as SBI has been mapped to the supervisor virtual address space and ecall instruction is not invoked for sbi_query_memory
uintptr_t __sbi_query_memory(uintptr_t id, memory_block_info *p)
{
  if (id == 0) {
    p->base = first_free_paddr;
    p->size = mem_size + DRAM_BASE - p->base;
    return 0;
  }

  return -1;
}
More about SBI can be found here https://github.com/slavaim/riscv-notes/blob/master/bbl/sbi-to-linux.md .
The kernel reserves the pages occupied by the kernel with a call to memblock_reserve(info.base, __pa(_end) - info.base); . Then a call to reserve_boot_page_table(pfn_to_virt(csr_read(sptbr))); reserves the pages occupied by the page table allocated by the bootloader, i.e. BBL.The Linux kernel retrieves the page table allocated and initialized by BBL by reading a physical address from the sptbr register and converting it to a virtual address. The page table virtual address is also saved at the master kernel Page Tables init_mm.pgd. The snippet is from linux/linux-4.6.2/arch/riscv/mm/init.c
void __init paging_init(void)
{
    init_mm.pgd = (pgd_t *)pfn_to_virt(csr_read(sptbr));
  ....
}

Sunday, March 5, 2017

RISC-V SBI mapping to Linux

This text is based on sbi-to-linux.md from my GitHub repo  riscv-notes
The machine level SBI ( Supervisor Binary Interface ) is exported to the Linux kernel by mapping it at the top of the address space.
The mapping is performed by BBL in supervisor_vm_init defined in riscv-tools/riscv-pk/bbl/bbl.c.
  // map SBI at top of vaddr space
  extern char _sbi_end;
  uintptr_t num_sbi_pages = ((uintptr_t)&_sbi_end - DRAM_BASE - 1) / RISCV_PGSIZE + 1;
  assert(num_sbi_pages <= (1 << RISCV_PGLEVEL_BITS));
  for (uintptr_t i = 0; i < num_sbi_pages; i++) {
    uintptr_t idx = (1 << RISCV_PGLEVEL_BITS) - num_sbi_pages + i;
    sbi_pt[idx] = pte_create((DRAM_BASE / RISCV_PGSIZE) + i, PTE_G | PTE_R | PTE_X);
  }
  pte_t* sbi_pte = middle_pt + ((num_middle_pts << RISCV_PGLEVEL_BITS)-1);
  assert(!*sbi_pte);
  *sbi_pte = ptd_create((uintptr_t)sbi_pt >> RISCV_PGSHIFT);
You can read more on supervisor_vm_init here https://github.com/slavaim/riscv-notes/blob/master/bbl/supervisor_vm_init.md . From the code above you can see that the last page ending at _sbi_end physical address is mapped at the last page of the supervisor virtual address space.
The offsets to SBI entry points are defined in riscv-tools/riscv-pk/machine/sbi.S as
.globl sbi_hart_id; sbi_hart_id = -2048
.globl sbi_num_harts; sbi_num_harts = -2032
.globl sbi_query_memory; sbi_query_memory = -2016
.globl sbi_console_putchar; sbi_console_putchar = -2000
.globl sbi_console_getchar; sbi_console_getchar = -1984
.globl sbi_send_ipi; sbi_send_ipi = -1952
.globl sbi_clear_ipi; sbi_clear_ipi = -1936
.globl sbi_timebase; sbi_timebase = -1920
.globl sbi_shutdown; sbi_shutdown = -1904
.globl sbi_set_timer; sbi_set_timer = -1888
.globl sbi_mask_interrupt; sbi_mask_interrupt = -1872
.globl sbi_unmask_interrupt; sbi_unmask_interrupt = -1856
.globl sbi_remote_sfence_vm; sbi_remote_sfence_vm = -1840
.globl sbi_remote_sfence_vm_range; sbi_remote_sfence_vm_range = -1824
.globl sbi_remote_fence_i; sbi_remote_fence_i = -1808
These definitions are offsets from the top of the address space for the SBI trampoline stubs defined in riscv-tools/riscv-pk/machine/sbi_entry.S
  # hart_id
  .align 4
  li a7, MCALL_HART_ID
  ecall
  ret

  # num_harts
  .align 4
  lw a0, num_harts
  ret

  # query_memory
  .align 4
  tail __sbi_query_memory
The SBI trampoline stubs code start is defined as sbi_base and is aligned to a page boundary by align RISCV_PGSHIFTdirective. The first RISCV_PGSIZE - 2048 bytes are reserved by .skip RISCV_PGSIZE - 2048 directive so the first instruction starts at 2048 bytes offset from the page top defined as
.align RISCV_PGSHIFT
  .globl sbi_base
sbi_base:

  # TODO: figure out something better to do with this space.  It's not
  # protected from the OS, so beware.
  .skip RISCV_PGSIZE - 2048
The end of the section is also aligned at the page boundary and is defined as
  .align RISCV_PGSHIFT
  .globl _sbi_end
_sbi_end:
The SBI trampoline stubs section .sbi is placed at the end of BBL just before the payload by defining the layout in riscv-tools/riscv-pk/bbl/bbl.lds as
   .sbi :
  {
    *(.sbi)
  }

  .payload :
  {
    *(.payload)
  }

  _end = .;
So the supervisor_vm_init code that maps machine level physical addresses to supervisor virtuall addresses maps the BBL .sbi section which contains SBI trampoline stubs at the top of the supervisor virtual address space.
Linux kernel access SBI trampoline stubs by a call with offsets defined in linux/linux-4.6.2/arch/riscv/kernel/sbi.Swhich is a carbon copy of riscv-tools/riscv-pk/machine/sbi.S . For example a snippet from Linux kernel entry point _start defined in linux/linux-4.6.2/arch/riscv/kernel/head.S
    /* See if we're the main hart */
    call sbi_hart_id
    bnez a0, .Lsecondary_start
This code is translated by GCC to
   0xffffffff80000018 <+24>:    jalr    -2048(zero) # 0xfffffffffffff800
   0xffffffff8000001c <+28>:    bnez    a0,0xffffffff80000054 <_start+84>
The address 0xfffffffffffff800 is 2048 bytes offset from the top of the virtual address space last page. As we saw above this page is backed by a physical page with SBI trampoline stubs code starting at sbi_base machine level physical address. The dissassembling shows the SBI trampoline stubs at offsets
 0xfffffffffffff800 which is sbi_hart_id = -2048
 0xfffffffffffff810 which is sbi_num_harts = -2032
 0xfffffffffffff820 which is sbi_query_memory = -2016
 0xfffffffffffff830 which is sbi_console_putchar = -2000
 etc
(gdb) x/48i 0xfffffffffffff800
   0xfffffffffffff800:  li  a7,0
   0xfffffffffffff804:  ecall
   0xfffffffffffff808:  ret
   0xfffffffffffff80c:  nop
   0xfffffffffffff810:  auipc   a0,0xfffff
   0xfffffffffffff814:  lw  a0,-1888(a0)
   0xfffffffffffff818:  ret
   0xfffffffffffff81c:  nop
   0xfffffffffffff820:  j   0xffffffffffff92c0
   0xfffffffffffff824:  nop
   0xfffffffffffff828:  nop
   0xfffffffffffff82c:  nop
   0xfffffffffffff830:  li  a7,1
   0xfffffffffffff834:  ecall
   0xfffffffffffff838:  ret
   0xfffffffffffff83c:  nop
   0xfffffffffffff840:  li  a7,2
   0xfffffffffffff844:  ecall
   0xfffffffffffff848:  ret
   0xfffffffffffff84c:  nop
   0xfffffffffffff850:  unimp
   0xfffffffffffff854:  nop
   0xfffffffffffff858:  nop
   0xfffffffffffff85c:  nop
   0xfffffffffffff860:  li  a7,4
   0xfffffffffffff864:  ecall
   0xfffffffffffff868:  ret
   0xfffffffffffff86c:  nop
   0xfffffffffffff870:  li  a7,5
   0xfffffffffffff874:  ecall
   0xfffffffffffff878:  ret
   0xfffffffffffff87c:  nop
   0xfffffffffffff880:  lui a0,0x989
   0xfffffffffffff884:  addiw   a0,a0,1664
   0xfffffffffffff888:  ret
   0xfffffffffffff88c:  nop
   0xfffffffffffff890:  li  a7,6
   0xfffffffffffff894:  ecall
   0xfffffffffffff898:  nop
   0xfffffffffffff89c:  nop
   0xfffffffffffff8a0:  li  a7,7
   0xfffffffffffff8a4:  ecall
   0xfffffffffffff8a8:  ret
   0xfffffffffffff8ac:  nop
   0xfffffffffffff8b0:  j   0xffffffffffff92f8
   0xfffffffffffff8b4:  nop
   0xfffffffffffff8b8:  nop
   0xfffffffffffff8bc:  nop
As you can see not all SBI trampolines stubs invoke ecall system call to enter a higher privilege level, the machine level in this case. For example query_memory is just an unconditional jump to the SBI code mapped to the Linux kernel space.
 0xfffffffffffff820:    j   0xffffffffffff92c0
In that case the CPU doesn't switch to machine level and continues in the supervisor mode with virtual memory enabled. When CPU switches to the machine mode it disables virtual address translation and switches back to physical addresses. Below is a call stack when query_memory is called. A you can see the CPU continues with virtual address memory enabled and uses virtual addresses. The debugger was unable to resolve a call to query_memory in BBL as it was not aware about the code being remapped to the Linux system address space.
#0  0xffffffffffff92c8 in ?? ()
#1  0xffffffff80002c38 in setup_bootmem () at arch/riscv/kernel/setup.c:149
#2  setup_arch (cmdline_p=<optimized out>) at arch/riscv/kernel/setup.c:152
#3  0xffffffff80000898 in start_kernel () at init/main.c:500
#4  0xffffffff80000040 in _start () at arch/riscv/kernel/head.S:36
I guess that one of the possible reasons for such query_memory implementation is to simplify development as this function returns the structure which would require either packing it in registers or translating addresses either in the Linux kernel or in BBL SDI.
The call stack for sbi_hart_id looks differently
#0  0x0000000080000c90 in mcall_trap (regs=0x82660ec0, mcause=9, mepc=18446744073709549572) at ../machine/mtrap.c:210
#1  0x00000000800000ec in trap_vector () at ../machine/mentry.S:116
Backtrace stopped: frame did not save the PC
(gdb) p/x mepc
$4 = 0xfffffffffffff804
The virtual address translation is disabled and the CPU works with physical addresses. The debugger was unable to cross the boundary back to the Linux kernel stack that requires processing address space translation switching. As you can see the mpec register points to the ecall instruction virtual address in supervisor mode
   0xfffffffffffff800:  li  a7,0
   0xfffffffffffff804:  ecall
   0xfffffffffffff808:  ret