Saturday, March 29, 2014

A promiscuous MmProbeAndLockPages

Consider a scenario when a Windows driver sweeping through a process address space somehow gets a pointer to a valid address range in the system address space and wants it to be accessible at IRQL greater or equal DISPATCH_LEVEL, i.e. when a scheduler is not available and swapped out pages can't be retrieved from a backing store. The solution is to lock pages by calling MmProbeAndLockPages. Is this is a bullet proof solution? The answer is NO. The driver will cause intermittent system crashes with a stack like shown below


The reason is that a system pool returns a page to a list of free pages and the system does not expect the page to be locked. This happens when the last allocation from a page has been released so the page does not contain valid allocations and can be returned to the system's list of free pages. The system implies that all pool allocations that have been locked are unlocked by calling MmUnlockPages before being freed by calling ExFreePool.

Tuesday, March 25, 2014

MacOS X hibernate path and preemption

Below is a MacOS X 10.9 callstack while processing a request to hibernate

mach_kernel`IOPMrootDomain::pmStatsRecordEvent() + 227 at IOPMrootDomain.cpp:6969
mach_kernel`hibernate_machine_init + 3160 at IOHibernateIO.cpp:3112
mach_kernel`acpi_sleep_kernel() + 503 at acpi.c:320
AppleACPIPlatform`AppleACPIPlatformExpert::sleepPlatform() + 443
AppleACPIPlatform`AppleACPICPU::haltCPU() + 117
mach_kernel`IOCPUSleepKernel() + 764 at IOCPU.cpp:403
mach_kernel`IOPMrootDomain::powerChangeDone() + 531 at IOPMrootDomain.cpp:2256
mach_kernel`IOService::all_done() + 1221 at IOServicePM.cpp:4269
mach_kernel`IOService::servicePMRequest(IOPMRequest*, IOPMWorkQueue*) [inlined] IOService::OurChangeFinish() + 8 at IOServicePM.cpp:4736
mach_kernel`IOService::servicePMRequest() + 2773 at IOServicePM.cpp:7337
mach_kernel`IOPMWorkQueue::checkRequestQueue() + 52 at IOServicePM.cpp:8236
mach_kernel`IOPMWorkQueue::checkForWork() + 127 at IOServicePM.cpp:8296
mach_kernel`IOWorkLoop::runEventSources() + 258 at IOWorkLoop.cpp:367
mach_kernel`IOWorkLoop::threadMain() + 195 at IOWorkLoop.cpp:395

The call to IOPMrootDomain::pmStatsRecordEvent is done with preemption disabled, i.e. the threads scheduling is not allowed, but IOPMrootDomain::pmStatsRecordEvent calls IORegistryEntry::setProperty that acquires a mutex and mutex acquisition can block a thread calling the scheduler in case of contention for mutex. So, is this a bug in the Apple code? I do not know but there is a workaround in the Apple code to not panic a debug kernel build when calling IORegistryEntry::setProperty with preemption disabled - a check cmpl $0,%gs:CPU_HIBERNATE bypasses the whole CHECK_PREEMPTION_LEVEL macro if a CPU is in hibernating mode. It makes sense if there are no other active CPUs in the system.

An excerpt from i386_locks.s
 * If one or more simplelocks are currently held by a thread,
 * an attempt to acquire a mutex will cause this check to fail
 * (since a mutex lock may context switch, holding a simplelock
 * is not a good thing).
cmpl $0,%gs:CPU_HIBERNATE ; \
jne 1f ; \
cmpl $0,%gs:CPU_PREEMPTION_LEVEL ; \
je 1f ; \
movl %gs:CPU_PREEMPTION_LEVEL, %eax ; \
LOAD_ARG1(%eax) ; \
hlt ; \
.data ; \
2: String "preemption_level(%d) != 0!" ; \
.text ; \
#else /* MACH_RT */
#endif /* MACH_RT */

Monday, March 24, 2014

Windows Object Manager, Paged Pool and elevated IRQL

Surprisingly Windows 8 Object Manager allocates some objects from the Paged Pool, that means that ObReferenceObject and ObDereferenceObject can't be safely called at DISPATCH_LEVEL as the actual maximum IRQL becomes APC_LEVEL if an object is allocated from the paged pool, for example a token object might be from the paged pool, as !pool command shows

1: kd> !pool ffffc00002b73770
Pool page ffffc00002b73770 region is Paged pool
*ffffc00002b73740 size:  8c0 previous size:  1c0  (Allocated) *Toke
Pooltag Toke : Token objects, Binary : nt!se

The object itself ( a pretty large pointer count, but nevertheless this is a valid object )

1: kd> !object ffffc00002b737a0
Object: ffffc00002b737a0  Type: (ffffe00000153db0) Token
    ObjectHeader: ffffc00002b73770 (new version)
    HandleCount: 33  PointerCount: 131067

Driver Verifier was active and cleared the valid bit from a PTE mapping the paged pool's page on which the object was allocated

1: kd> !pte ffffc00002b737a0
                                           VA ffffc00002b737a0
PXE at FFFFF6FB7DBEDC00    PPE at FFFFF6FB7DB80000    PDE at FFFFF6FB700000A8    PTE at FFFFF6E000015B98
contains 000000000134F863  contains 0000000001DCE863  contains 00000001257C2863  contains FB40000129FE9882
pfn 134f      ---DA--KWEV  pfn 1dce      ---DA--KWEV  pfn 1257c2    ---DA--KWEV  not valid
                                                                                  Transition: 129fe9
                                                                                  Protect: 4 - ReadWrite

the PTE was marked as invalid though the physical page actually contains valid data and has not been reused and swapped out, the valid bit will be brought back by the page fault handler when processing a page fault ( this is called a soft page fault when there is no IO from backing store ), but calling ObDereferenceObject and providing this object at DISPATCH_LEVEL would crash the system

TRAP_FRAME:  ffffd000201fc800 -- (.trap 0xffffd000201fc800)
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed or incorrect.
rax=0000000000000005 rbx=0000000000000000 rcx=ffffc00002b737a0
rdx=0000000000000005 rsi=0000000000000000 rdi=0000000000000000
rip=fffff803b20565a3 rsp=ffffd000201fc990 rbp=fffff800017bf594
 r8=0000000000000007  r9=fffff800017debac r10=0000000000000000
r11=ffffd000201fcc70 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0         nv up ei ng nz na po nc
fffff803`b20565a3 f0480fc15ed0    lock xadd qword ptr [rsi-30h],rbx ds:ffffffff`ffffffd0=????????????????
Resetting default scope

LAST_CONTROL_TRANSFER:  from fffff803b21f10ea to fffff803b216f890

<here is an offending driver ))))>

Monday, March 17, 2014

A spectrum of a 2.4 GHz WiFi

Just for fun, below is a picture for a 2.4 GHz WiFi spectrum captured by a GWInstek-730 spectrum analyzer at my house

The bar lines don't represent WiFi channels, the channels are much wider - at least 20 MHz, these bar lines is just a feature of the sweep-tuned spectrum analyzer capturing fast changing signal. I believe that GWInstek-730 is an analogue sweep-tuned analyzer that does not perform DFT / FFT of a signal.