Saturday, March 26, 2016

Building Linux kernel for Intel Galileo QuarkX1000

Intel Galileo board features Intel Quark X1000 SoC with a single core single threaded x86 CPU similar to Pentium.

Intel Galileo with OLIMEX JTAG and FTDI USB-to-Serial


I will show how to build a 3.8.7 version of the Linux kernel for a Linux image iot-devkit-201510010757-mmcblkp0-galileo.direct.xz that can be downloaded from  Intel® Galileo Board Downloads under Intel® Galileo Board microSD Card Linux* Operating System Image caption, at the time of writing this was the latest stable OS image build for the board.

Below are the steps to build a kernel and modules

1. Get BSPv1.1.0 package from Intel® Quark™ BSP Release Archive or from my GitHub BSPv1.1.0 where it is stored for convinience, you need v1.1.0. Its file name is Board_Support_Package_Sources_for_Intel_Quark_v1.1.0.7z . The newer BSP versions contain bugs and I believe were never used by Intel to build any kernel, for example patch list in BSPv1.2.1.7 doesn't match with the kernel version 3.14 used in upstream.cfg file and gitsetup.py script is missing, though it can be borrowed from an older BSP package it shows that BSPv1.2.1.7 was released without any sanity check.

2. Extract BSPv1.1.0 . Below BSP_DIR must be replaced with a full path to the directory where BSP package was extracted, for example /work/IntelGalileo/BSPv1.1.0/ .

3. Build the tools for cross compilation. Replace BSP_DIR with a path to extracted BSP package. This could take a couple of hours depending on Internet connection and CPUs power. Replace BSP_DIR with a path to extracted BSP package.

   $ cd BSP_DIR
   $ tar zxf meta-clanton_v1.1.0-dirty.tar.gz
   $ cd meta-clanton_v1.1.0-dirty
   $ ./setup.sh -e meta-clanton-bsp
   $ source iot-devkit-init-build-env build
   $ bitbake image-full

4. Build the kernel. To build the kernel you need a .config file. Intel documents will direct you to use  meta/cfg/kernel-cache/bsp/quark/quark.cfg from the kernel sources downloaded and patched by running gitsetup.py but a kernel built with this config file doesn't contain modules for memory card interface so the kernel unable to mount file system from the card. The better way is to borrow a .config file from a running system created from iot-devkit-201510010757-mmcblkp0-galileo.direct.xz image by executing 
   $ zcat /proc/config.gz > .config
and copying the file to a building machine. Alternatively the file can be downloaded from my GitHub repository by this link .config .

Extract the kernel package. It contains a script to obtain the source code and patches.

   $ cd BSP_DIR
   $ tar zxf quark_linux_v3.8.7+v1.1.0.tar.gz
   $ mv quark_linux_v3.8.7+v1.1.0 linux_v3.8.7
   $ cd linux_v3.8.7

The following two steps can be skipped if you already have set git user name and email, if a user and email are not defined there will be errors while executing gitsetup.py, note that a user and email are not required to be genuine

   $ git config –-global user.name  "user"  
   $ git config –-glob al user.email "user@hotmail.com"

Execute gitsetup.py, this will clone and patch the required kernel version from the kernel repository.

   $ ./gitsetup.py

Change a directory to ./work that contains the source code and copy a config file

   $ cd work
   $ cp <A .config file obtained as described above> .config

Set a path for cross-compiler. Replace BSP_DIR with a path to extracted BSP package.

   $ export PATH=BSP_DIR/meta-clanton_v1.1.0-dirty/build/tmp/sysroots/x86_64-linux/usr/bin/i586-poky-linux:$PATH

The following command will build a kernel and modules.

   $ ARCH=i386 CROSS_COMPILE=i586-poky-linux- make -j4

Thursday, March 24, 2016

What happens with outstanding IRPs when a process terminates.

When Windows kernel terminates a process it inserts APC in each thread, in turn this APC calls PspExitThread that calls IoCancelThreadIo to cancel if possible all outstanding IRPs by calling IoCancelIrp and waits for IRP cancelation or completion. A thread waits only for IRPs that have been associated with a thread by calling IopQueueThreadIrp that adds IRP in a list of IRPs associated with a thread, the list head is IrpList field of the ETHREAD structure.

A thread will be blocked until any of the two conditions takes place
 - IrpList becomes empty, that means all outstanding IRPs completed in a normal way or were cancelled
 - 5 minutes timeout expired, in that case IopDisassociateThreadIrp is called to perform IRPs disassociation by removing them from IrpList and setting IRP->Tail.Overlay.Thread to NULL

Below is a call stack for a terminating thread with four outstanding IRPs ( marked yellow ).

        THREAD 870788e8  Cid 1100.0f5c  Teb: 7ffab000 Win32Thread: 00000000 WAIT: (DelayExecution) KernelMode Non-Alertable
            86d29460  SynchronizationEvent
        IRP List:
            87305dc8: (0006,0100) Flags: 00060a00  Mdl: 00000000
            86ecd6f8: (0006,0100) Flags: 00060a00  Mdl: 00000000
            862fd7b8: (0006,0100) Flags: 00060a00  Mdl: 00000000
            87221d80: (0006,0100) Flags: 00060a00  Mdl: 00000000
        Not impersonating
        DeviceMap                 975b0820
        Owning Process            8514a030       Image:         explorer.exe
        Attached Process          N/A            Image:         N/A
        Wait Start TickCount      22339          Ticks: 1 (0:00:00:00.015)
        Context Switch Count      2734           IdealProcessor: 1          
        UserTime                  00:00:00.000
        KernelTime                00:00:00.031
        Win32 Start Address 0x769842ed
        Stack Init a86bfed0 Current a86bfa38 Base a86c0000 Limit a86bd000 Call 0
        Priority 10 BasePriority 8 UnusualBoost 0 ForegroundBoost 2 IoPriority 2 PagePriority 2
        ChildEBP RetAddr
        a86bfa50 82ad269d nt!KiSwapContext+0x26 (FPO: [Uses EBP] [0,0,4])
        a86bfa88 82ad14f7 nt!KiSwapThread+0x266
        a86bfab0 82ad11d5 nt!KiCommitThreadWait+0x1df
        a86bfb0c 82cb9171 nt!KeDelayExecutionThread+0x2aa
        a86bfb40 82cbe519 nt!IoCancelThreadIo+0x70
        a86bfbb4 82cd2051 nt!PspExitThread+0x48e
        a86bfbcc 82b058c0 nt!PsExitSpecialApc+0x22
        a86bfc1c 82a922a4 nt!KiDeliverApc+0x28b
        a86bfc1c 778770b4 nt!KiServiceExit+0x64 (FPO: [0,3] TrapFrame @ a86bfc34)
        074fe3e4 00000000 ntdll!KiFastSystemCallRet (FPO: [0,0,0])

A main process thread waits for child threads termination.

        THREAD 86e307f0  Cid 1100.1104  Teb: 7ffdf000 Win32Thread: fe9c8a88 WAIT: (Executive) KernelMode Non-Alertable
            870788e8  Thread
        Not impersonating
        DeviceMap                 975b0820
        Owning Process            8514a030       Image:         explorer.exe
        Attached Process          N/A            Image:         N/A
        Wait Start TickCount      21665          Ticks: 675 (0:00:00:10.530)
        Context Switch Count      15565          IdealProcessor: 1          
        UserTime                  00:00:00.249
        KernelTime                00:00:00.530
        Win32 Start Address 0x00e50efa
        Stack Init a87b5ed0 Current a87b5a38 Base a87b6000 Limit a87b3000 Call 4f0
        Priority 12 BasePriority 8 UnusualBoost 0 ForegroundBoost 2 IoPriority 2 PagePriority 5

        ChildEBP RetAddr
        a87b5a50 82ad269d nt!KiSwapContext+0x26 (FPO: [Uses EBP] [0,0,4])
        a87b5a88 82ad14f7 nt!KiSwapThread+0x266
        a87b5ab0 82acb0cf nt!KiCommitThreadWait+0x1df
        a87b5b2c 82cbe28e nt!KeWaitForSingleObject+0x393
        a87b5bb4 82cd2051 nt!PspExitThread+0x203
        a87b5bcc 82b058c0 nt!PsExitSpecialApc+0x22
        a87b5c1c 82a922a4 nt!KiDeliverApc+0x28b
        a87b5c1c 77876fc0 nt!KiServiceExit+0x64 (FPO: [0,3] TrapFrame @ a87b5c34)
        000ffb18 00000000 ntdll!KiUserCallbackDispatcher (FPO: [0,0,0])

Tuesday, March 8, 2016

Mac OS X file system filtering via hooking technique.

 Mac OS X doesn't support a full fledged file system filtering like Windows as Apple believes that BSD stackable file system doesn't fit Mac OS X. The available kernel authorization subsystem ( kauth ) allows only filtering open requests and a limited number of operations on file/directory. It doesn't allow filtering read and write operations and provides a limited control over file system operations as a vnode is already created at the moment of kauth callback invoking. In Windows parlance a kauth callback is a postoperation callback for create/open request ( IRP_MJ_CREATE for Windows ), that is all you have for Mac OS X. Not much really.

 The filtering can also be implemented by registering MAC ( Mandatory Access Control ) layer. But MAC has limited functionality and not consistent in relation to file system filtering as it was not designed as a file system filter layer. Instead of being called by VFS layer MAC registered callbacks are scattered through the kernel code. They are called from system calls and other kernel subsystems, so MAC doesn't provide a consistent interface for VFS filter and if I remember correctly MAC was declared by Apple as deprecated for usage by third party developers.

 To summarize the above, there is no official support from Apple if you need to filter read or/and write requests, filter and modify VFS vnode operation.

 The lack of a stackable file system support by Mac OS X VFS required to find a way to place a filter between VFS invoking vnode operations via VNOP_* functions and a file system driver implementing these vnode operations. I developed a hooking technique for VFS layer that emulates a stackable file system by replacing vnode operations in array. This technique allows to place a filter between VFS and file system driver and supports sophisticated filtering such as isolation file system filter when a filter creates vnodes instead of a file system driver thus gaining a full control over file data. I used this technique in two projects to implement filtering for lookup, create, read and write requests and to implement an isolation file system.

 The code for a skeleton file system filter is in my GitHub repository https://github.com/slavaim/MacOSX-FileSystem-Filter it was extracted from a bigger project MacOSX-Kernel-Filter in the same repository.

 FltFakeFSD.h and FltFakeFSD.cpp are optional files that helps to infer the vnode and vnodeop_desc structures layout that are not declared in SDK.  This is achieved by registering a dummy file system, creating a vnode and inspecting it to find required offsets. All you need to do is to call FltGetVnodeLayout() in a filter initialization code.

 Alternatively you can extract declarations from XNU code at opensource.apple.com. An alternative implementation without using the fake FSD can be found in VersionDependent.h and VersionDependent.cpp files, where struct vnodeop_desc_Yosemite was borrowed from Apple's open source code for Yosemite(10.10), it happened that vnode and vnodeop_desc structures haven't change in all latest kernel versions so this code works for Mavericks(10.9) and El Capitan(10.11).

 The filter registers a kauth callback FltIOKitKAuthVnodeGate::VnodeAuthorizeCallback which in turn calls FltHookVnodeVopAndParent that hooks vnode related operations. The kauth callback is used only to trigger filtering for a file system. When an application opens a file FileX in a directory DirX a vnode for a DirX is provided as a parameter for VnodeAuthorizeCallback so the following calls to file system's lookup or create operations for FileX will be visible for file system filter. When an original lookup or create returns a filter calls FltHookVnodeVopAndParent for a returned vnode or does this in a kauth callback which is nearly the same as kauth callback is a postoperation callback called after VNOP_LOOKUP or VNOP_CREATE. In case of lookup vnode operation do not forget about name cache, so lookup is not always called when an application calls open( FileX ), but if you need to see all lookup operations it is possible to disable name caching for a vnode. You can devise your own way of finding vnodes to hook depending on your requirements, for my purposes a kauth callback worked well as I was filtering vnodes of VREG type so I can use VDIR vnode as an anchor that started filtering for VREG vnodes. Also in most file systems vnode operations array is shared between all vnodes of VREG and VDIR type.

 The filter registers operations it wants to intercept in gFltVnodeVopHookEntries array. The skeleton filter immediately calls an original function from a hook but if you want to see a filter in action look at DldVNodeHook.cpp which is a more advanced implementation at MacOSX-Kernel-Filter that also implements an isolation filter for read and write operations on a file by redirecting read and write to a sparse file on another volume. The isolation filter is implemented at DldCoveringVnode.cpp and a sparse file at DldSparseFile.cpp

FYI a set of call stacks when hooks are active
  • Lookup hook
    frame #0: 0xffffff7f903b7a11 FsdFilter`FltVnopLookupHook(ap=0xffffff80a7053a18) + 17 at VFSHooks.cpp:65
    frame #1: 0xffffff800dd3f2f8 kernellookup(ndp=0xffffff80a7053d58) + 968 at kpi_vfs.c:2783
    frame #2: 0xffffff800dd3ea95 kernel`namei(ndp=0xffffff80a7053d58) + 1941 at vfs_lookup.c:371
    frame #3: 0xffffff800dd52005 kernel`nameiat(ndp=0xffffff80a7053d58, dirfd=<unavailable>) + 133 at vfs_syscalls.c:2920
    frame #4: 0xffffff800dd5fcc7 kernel`fstatat_internal(segflg=<unavailable>, ctx=<unavailable>, path=<unavailable>, ub=<unavailable>, xsecurity=<unavailable>, xsecurity_size=<unavailable>, isstat64=<unavailable>, fd=<unavailable>, flag=<unavailable>) + 231 at vfs_syscalls.c:5268
    frame #5: 0xffffff800dd54f0a kernel`stat64(p=<unavailable>, uap=0xffffff801e3f1380, retval=<unavailable>) + 58 at vfs_syscalls.c:5413
    frame #6: 0xffffff800e04dcb2 kernel`unix_syscall64(state=0xffffff801dfc5540) + 610 at systemcalls.c:366
  • Pagein hook
    frame #0: 0xffffff7f903b7b91 FsdFilterFlt`VnopPageinHook(ap=0xffffff80a76aba20) + 17 at VFSHooks.cpp:229
    frame #1: 0xffffff800e046372 kernel`vnode_pagein(vp=0xffffff801904db40, upl=0x0000000000000000, upl_offset=<unavailable>, f_offset=73416704, size=<unavailable>, flags=0, errorp=0xffffff80a76abae8) + 402 at kpi_vfs.c:4980
    frame #2: 0xffffff800db952a8 kernelv`node_pager_cluster_read(vnode_object=0xffffff80a76abb10, base_offset=73416704, offset=<unavailable>, io_streaming=<unavailable>, cnt=<unavailable>) + 72 at bsd_vm.c:1045
    frame #3: 0xffffff800db943f3 kernel`vnode_pager_data_request(mem_obj=0xffffff8018fb5e38, offset=73433088, length=<unavailable>, desired_access=<unavailable>, fault_info=<unavailable>) + 99 at bsd_vm.c:826
    frame #4: 0xffffff800db9f75b kernel`vm_fault_page(first_object=0x0000000000000000, first_offset=73433088, fault_type=1, must_be_resident=0, caller_lookup=0, protection=0xffffff80a76abe84, result_page=<unavailable>, top_page=<unavailable>, type_of_fault=<unavailable>, error_code=<unavailable>, no_zero_fill=<unavailable>, data_supply=0, fault_info=0xffffff80a76abbe0) + 3051 at memory_object.c:2178
    frame #5: 0xffffff800dba3902 kernel`vm_fault_internal(map=0xffffff8010f761e0, vaddr=140735436247040, fault_type=1, change_wiring=0, interruptible=2, caller_pmap=0x0000000000000000, caller_pmap_addr=<unavailable>, physpage_p=0x0000000000000000) + 3042 at vm_fault.c:4423
    frame #6: 0xffffff800dc1ec9c kernel`user_trap(saved_state=<unavailable>) + 732 at vm_fault.c:3229
  • Close hook
    frame #0: 0xffffff7f903b7a91 FsdFilter`FltVnopCloseHook(ap=0xffffff80a78cbd28) + 17 at VFSHooks.cpp:119
    frame #1: 0xffffff800dd734b7 kernel`VNOP_CLOSE(vp=0xffffff801c7144b0, fflag=<unavailable>, ctx=<unavailable>) + 215 at kpi_vfs.c:3047
    frame #2: 0xffffff800dd671d1 kernel`vn_close(vp=0xffffff801c7144b0, flags=<unavailable>, ctx=0xffffff80a78cbe60) + 225 at vfs_vnops.c:723
    frame #3: 0xffffff800dd6850f kernel`vn_closefile(fg=<unavailable>, ctx=0xffffff80a78cbe60) + 159 at vfs_vnops.c:1494
    frame #4: 0xffffff800dfb1680 kernel`closef_locked [inlined] fo_close(fg=0xffffff801ba367e0, ctx=0xffffff801bcb42a0) + 14 at kern_descrip.c:5711
    frame #5: 0xffffff800dfb1672 kernel`closef_locked(fp=<unavailable>, fg=0xffffff801ba367e0, p=0xffffff8019600650) + 354 at kern_descrip.c:4982
    frame #6: 0xffffff800dfad88e kernelclose_internal_locked(p=0xffffff8019600650, fd=<unavailable>, fp=0xffffff801fc113d8, flags=<unavailable>) + 542 at kern_descrip.c:2765
    frame #7: 0xffffff800dfb13d6 kernel`close_nocancel(p=0xffffff8019600650, uap=<unavailable>, retval=<unavailable>) + 342 at kern_descrip.c:2666
    frame #8: 0xffffff800e04dcb2 kernel`unix_syscall64(state=0xffffff801bc20b20) + 610 at systemcalls.c:366
  • Inactive hook
    frame #0: 0xffffff7fa9dc6ac0 FsdFilter`FltVnopInactiveHook(ap=0xffffff803c5198cc) at VFSHooks.cpp:140
    frame #1: 0xffffff8027775e06 kernelVNOP_INACTIVE(vp=0xffffff803c519870, ctx=<unavailable>) + 70 at kpi_vfs.c:4756
    frame #2: 0xffffff8027745543 kernel`vnode_rele_internal(vp=0xffffff80bfecbd90, fmode=<unavailable>, dont_reenter=<unavailable>, locked=<unavailable>) + 435 at vfs_subr.c:1929
    frame #3: 0xffffff80279c5359 kernel`proc_exit(p=0xffffff80338eb000) + 2841 at vfs_subr.c:1845
    frame #4: 0xffffff802756447a kernel`thread_terminate_self + 474 at thread.c:466
    frame #5: 0xffffff802756878b kernel`special_handler(rh=<unavailable>, thread=<unavailable>) + 219 at thread_act.c:891
    frame #6: 0xffffff802756855a kernel`act_execute_returnhandlers + 202 at thread_act.c:801
    frame #7: 0xffffff8027536e76 kernel`ast_taken(reasons=<unavailable>, enable=<unavailable>) + 278 at ast.c:177
    frame #8: 0xffffff802761eeae kernel`i386_astintr(preemption=<unavailable>) + 46 at trap.c:1171
The module has been made unloadable by taking a reference to IOKit com_FsdFilter class in com_FsdFilter::start routine. The reason is that unload for a hooking file system filter driver is a dangerous operation as it is hard to implement it correctly to avoid race conditions and preserve data consistency. To process unload correctly you need to process the following cases
  • unhook all vnode tables
  • check that there is no return to the hooking code in any call stack or else unloading a module results in a panic as control returns to unloaded code
  • check that there is no hooking code being called, i.e. there is no a processor with IP register pointing to any function in a driver
  • ensure that data changed by a filter will not crash the system or damage user or system data, for example if the driver implements any data modification it should guarantee data consistency on unload
The only feasible way to implement this is dividing a driver in two parts - one part can be unloaded and another part is never unloaded. Unloading complicates the code as you have to synchronize the code with unloading procedure, this increases complexity and as a result of this complexity and nondeterminism in the system state it's never implemented correctly.

Tuesday, March 1, 2016

A case of successful registration of an incorrectly defined file system minifilter.

 An interesting observation. If you forget to terminate FLT_OPERATION_REGISTRATION array with IRP_MJ_OPERATION_END then no instances will be attached but a minifilter is successfully registered and InstanceSetup callback is called. No any error is reported. Just yet another case of a closed source Microsoft subsystem with inconsistent behavior when you can spent hours chasing a bug by trial and error approach instead of looking at source code.