Wednesday, October 12, 2016

Windows and Linux kernels exception handling and stack unwinding

The interesting difference between Windows and Linux kernels is in Windows mechanism to unwind a call stack, aka Frame Unwind. Windows 64 bit and Linux kernels use the table based exception processing to locate a handler for an instruction that caused an exception. Windows kernel can unwind a call stack to locate a caller's handler while Linux requires to have a table entry for each executable address range that can cause an exception.

You can look at pseudo-code for Windows 64 bit RtlUnwind here StackWalk64.cpp .

Some resources on Windows 64 bit SEH implementation.

1. Exceptional behavior: the Windows 8.1 X64 SEH Implementation

2. Exceptional Behavior - x64 Structured Exception Handling - OSR Online.

3. Johnson, Ken. " Programming against the x64 exception handling support ."

Wednesday, September 28, 2016

What happens if Linux kernel module unloads with a running kernel thread?

The kernel will ooops in the page fault handler in the kernel thread context and terminate this thread. Click on the image to see a backtrace.

Tuesday, September 6, 2016

Waiting for concurrent page fault completion

An interesting call stack when a thread waits in a page fault for another thread completing paging data from a file

00 nt!KiSwapContext
01 nt!KiSwapThread
02 nt!KiCommitThreadWait
03 nt!KeWaitForSingleObject
04 nt!MiWaitForCollidedFaultComplete
05 nt!MiResolveTransitionFault
06 nt!MiResolveProtoPteFault
07 nt!MiDispatchFault
08 nt!MmAccessFault
09 nt!KiPageFault
0a nt!memcpy
0b nt!CcCopyBytesToUserBuffer
0c nt!CcMapAndCopyFromCache
0d nt!CcCopyReadEx
0e nt!CcCopyRead
0f nt!FsRtlCopyRead
10 ***
11 ***
12 ***
13 nt!NtReadFile
14 nt!KiSystemServiceCopyEnd

Friday, September 2, 2016

FileObjects and SectionObjectPointer in Windows.

Just for the record.

FileObject->SectionObjectPointer is allocated and set by a file system driver but the structure is managed by the Memory Manager (Mm). SectionObjectPointer is shared between all file objects for the same data stream.

FileObject->SectionObjectPointer->DataSectionObject and FileObject->SectionObjectPointer->ImageSectionObject contain address of ControlArea for data and image.

ControlArea deletion is synchronized by ControlArea->WaitingForDeletion and ControlArea->u.Flags.BeingDeleted. WaitingForDeletion points to a structure with notification event and a reference counter.

All functions that might destroy control area take SectionObjectPointer as a parameter. These functions acquire a global lock then check that ControlArea is not NULL. If control area exists ControlArea->u.Flags.BeingDeleted is checked and if it is set a function waits on WaitingForDeletion event with incremented reference counter so the event is deleted when the last waiting thread exit from a waiting state and the reference counter drops to zero. A call to MiCleanSection set SectionObjectPointer->DataSectionObject  and  SectionObjectPointer->ImageSectionObject  to NULL. This call is synchronized with ControlArea->u.Flags.BeingDeleted.

The functions that might delete control area include MmFlushImageSection and CcPurgeCacheSection. That means that it is safe to provide SectionObjectPointer to these functions without synchronizing with file objects deletion. It is even possible to call this functions with a SectionObjectPointer when all related file objects have been deleted or have IopDeleteFile being called for them which might happen in IRP_MJ_PNP processing path.

Friday, August 26, 2016

File mapping and FILE_OBJECT in Windows

There is a WinDBG command !ca that shows file mapping related information. I will show how to get this file mapping information for a file object ( FILE_OBJECT type) by a direct access to structures.

The core of file mapping( and file data caching that uses file mapping ) is SEGMENT object and CONTROL_AREA structures. SEGMENT object contains a pointer to an array of Prototype PTEs ( ProtoPTE ) of _MMPTE_PROTOTYPE type. Each ProtoPTE points to a related physical page if the page is valid. When a file mapping is created the related  virtual memory range PTEs( Page Table Entries ) have the invalid bit set and point to Prototype PTEs. When a corresponding virtual address is accessed a page fault happens, the page fault handler follows a link to ProtoPTE and fixes process PTE to point to a real page. That allows all processes to share the same physical pages for the same file memory mapping. The physical page might need to be allocated and data read in from a file if this has not been done before, after that the page is shared between all processes mapping the file.

FILE_OBJECT has SectionObjectPointer field which is set by a file system driver (FSD) but all its fields are initialized by Memory Manager(CC) and Cache Manager(CC). SectionObjectPointer is of _SECTION_OBJECT_POINTERS type with DataSectionObject field pointing to a CONTROL_AREA structure that in turn points to a SEGMENT object. CONTROL_AREA has a _SUBSECTION structure following it at the tail, all subsequent _SUBSECTION structures are linked by NextSubsection  pointer. Each _SUBSECTION has SubsectionBase field that points to a related ProtoPTEs array.

Below all these structures for a real file object are printed from WinDBG.

0: kd> ??FileObject
struct _FILE_OBJECT * 0x8750ef80
   +0x000 Type             : 0n5
   +0x002 Size             : 0n128
   +0x004 DeviceObject     : 0x879c9030 _DEVICE_OBJECT
   +0x008 Vpb              : 0x879d5888 _VPB
   +0x00c FsContext        : 0x87bdde68 Void
   +0x010 FsContext2       : 0x863cc188 Void
   +0x014 SectionObjectPointer : 0x87bddea8 _SECTION_OBJECT_POINTERS
   +0x018 PrivateCacheMap  : 0x869acf90 Void
   +0x01c FinalStatus      : 0n0
   +0x020 RelatedFileObject : (null) 
   +0x024 LockOperation    : 0 ''
   +0x025 DeletePending    : 0 ''
   +0x026 ReadAccess       : 0 ''
   +0x027 WriteAccess      : 0 ''
   +0x028 DeleteAccess     : 0 ''
   +0x029 SharedRead       : 0 ''
   +0x02a SharedWrite      : 0 ''
   +0x02b SharedDelete     : 0 ''
   +0x02c Flags            : 0xc0012
   +0x030 FileName         : _UNICODE_STRING "\Sample Pictures\Chrysanthemum.jpg"
   +0x038 CurrentByteOffset : _LARGE_INTEGER 0x11000
   +0x040 Waiters          : 0
   +0x044 Busy             : 1
   +0x048 LastLock         : (null) 
   +0x04c Lock             : _KEVENT
   +0x05c Event            : _KEVENT
   +0x06c CompletionContext : (null) 
   +0x070 IrpListLock      : 0
   +0x074 IrpList          : _LIST_ENTRY [ 0x8750eff4 - 0x8750eff4 ]
   +0x07c FileObjectExtension : 0x8774e950 Void

0: kd> ??FileObject->SectionObjectPointer
struct _SECTION_OBJECT_POINTERS * 0x87bddea8
   +0x000 DataSectionObject : 0x863c1758 Void
   +0x004 SharedCacheMap   : 0x869acea0 Void
   +0x008 ImageSectionObject : (null) 

0: kd> dt nt!_CONTROL_AREA 0x863c1758 
   +0x000 Segment          : 0xaeb311a8 _SEGMENT
   +0x004 DereferenceList  : _LIST_ENTRY [ 0x0 - 0x0 ]
   +0x00c NumberOfSectionReferences : 1
   +0x010 NumberOfPfnReferences : 0x40
   +0x014 NumberOfMappedViews : 1
   +0x018 NumberOfUserReferences : 0
   +0x01c u                : <unnamed-tag>
   +0x020 FlushInProgressCount : 0
   +0x024 FilePointer      : _EX_FAST_REF
   +0x028 ControlAreaLock  : 0n0
   +0x02c ModifiedWriteCount : 0
   +0x02c StartingFrame    : 0
   +0x030 WaitingForDeletion : (null) 
   +0x034 u2               : <unnamed-tag>
   +0x040 LockedPages      : 0n1
   +0x048 ViewList         : _LIST_ENTRY [ 0x86a3a898 - 0x86a3a898 ]

0: kd> ??sizeof(nt!_CONTROL_AREA)
unsigned int 0x50

0: kd> dt nt!_SUBSECTION 0x863c1758+0x50
   +0x000 ControlArea      : 0x863c1758 _CONTROL_AREA
   +0x004 SubsectionBase   : 0xa8e9b008 _MMPTE
   +0x008 NextSubsection   : (null) 
   +0x00c PtesInSubsection : 0x40
   +0x010 UnusedPtes       : 0
   +0x010 GlobalPerSessionHead : (null) 
   +0x014 u                : <unnamed-tag>
   +0x018 StartingSector   : 0
   +0x01c NumberOfFullSectors : 0x40

0: kd> dt nt!_SEGMENT  0xaeb311a8 
   +0x000 ControlArea      : 0x863c1758 _CONTROL_AREA
   +0x004 TotalNumberOfPtes : 0x40
   +0x008 SegmentFlags     : _SEGMENT_FLAGS
   +0x00c NumberOfCommittedPages : 0
   +0x010 SizeOfSegment    : 0x40000
   +0x018 ExtendInfo       : (null) 
   +0x018 BasedAddress     : (null) 
   +0x01c SegmentLock      : _EX_PUSH_LOCK
   +0x020 u1               : <unnamed-tag>
   +0x024 u2               : <unnamed-tag>
   +0x028 PrototypePte     : 0xa8fe97e8 _MMPTE
   +0x030 ThePtes          : [1] _MMPTE

1: kd> dt nt!_MMPTE .
   +0x000 u                :
      +0x000 Long             : Uint8B
      +0x000 VolatileLong     : Uint8B
      +0x000 HighLow          : _MMPTE_HIGHLOW
      +0x000 Flush            : _HARDWARE_PTE
      +0x000 Hard             : _MMPTE_HARDWARE
      +0x000 Proto            : _MMPTE_PROTOTYPE
      +0x000 Soft             : _MMPTE_SOFTWARE
      +0x000 TimeStamp        : _MMPTE_TIMESTAMP
      +0x000 Trans            : _MMPTE_TRANSITION
      +0x000 Subsect          : _MMPTE_SUBSECTION
      +0x000 List             : _MMPTE_LIST

1: kd> dt nt!_MMPTE_PROTOTYPE
   +0x000 Valid            : Pos 0, 1 Bit
   +0x000 Unused0          : Pos 1, 7 Bits
   +0x000 ReadOnly         : Pos 8, 1 Bit
   +0x000 Unused1          : Pos 9, 1 Bit
   +0x000 Prototype        : Pos 10, 1 Bit
   +0x000 Protection       : Pos 11, 5 Bits
   +0x000 Unused           : Pos 16, 16 Bits
   +0x000 ProtoAddress     : Pos 32, 32 Bits

Tuesday, August 23, 2016

Mac OS X file system redirector

 I committed a new project in my GitHub repository. A file system requests redirection filter MacOSX-VFS-redirector. The project is based on MacOSX-FileSystem-Filter .

The filter redirects file creation, open requests, rename and data IO (read, write) from an application to a shadow directory where shadow copies for files are created. The shadow directory path can cross mount points. An application under control doesn't aware about redirection and believes it works with original files by using unmodified paths. Applications under control are registered in gApplicationsData array. The array is declared in ApplicationsData.cpp .

The filter employs a user mode client for data modification and shadow file creation. See processing for VFSDataType_PreOperationCallback in user mode client's main.cpp .

The filter's core is VFSHooks.cpp . It contains VFS hooks to intercept file creation and open, redirect IO and call a user client.

The filter was tested on Mac OS X Yosemite (10.10) and Mac OS X El Capitan (10.12).