Thursday, February 27, 2014

What is in a name? ( of a process )

What does PsGetCurrentProcess return?

The answer - it returns the thread's process. But do you know that a thread might have TWO processes? The first one is the parent process that created the thread and the other one is a process to which the thread has been attached by KeStackAttachProcess . Which one does PsGetCurrentProcess return? It returns the attached process if one is not NULL or a parent process otherwise.

So this brings a question - How to get a parent process? The answer is IoThreadToProcess.

The other question - What does it mean "attach to process"? This mean that the thread operates in the address space of the attached process( i.e. PDE and CR3 are changed ). That means that any function that operates on the UserMode part of the address space will change or fetch the data from the attached process. The notion of "attached process" is meaningful only when a thread is executing in the KernelMode, as the system space is nearly completely shared between all processes and changing the Page Tables does not have a serious impact on accessing the system space.

The notion of attaching is much more profound in 32 bit Mac OS X or iOS where all processes have access to the full virtual address space of 4 GB, there is no division on system and user space, when the thread switches to the kernel mode the CR3 register is reloaded, the access to a user space by a pointer is not possible for 32 bit Mac OS X kernel so to access the user space the kernel ( or kernel module ) calls the functions that access the user space by switching CR3. In case of 64 bit Mac OS X or iOS the process space is divided on user space and kernel space and the access by pointer becomes possible though is discouraged by Apple and will crash the system in debug mode when the CR3 is reloaded when a thread enters kernel mode.

Wednesday, February 26, 2014

Outswapped kernel stack

You definitely know that kernel stack can be outswapped if some conditions are met. One such condition is waiting with WaitMode set to UserMode.

 If an event is allocated on a kernel stack it can be swapped out when a driver does something like this

GetOperationCompletionStatus( ... )
{
    KEVENT    Event;

    KeInitializeEvent( &Event. SynchronizationEvent, FALSE );

    KeAcquireSpinlock( &Lock, &OldIrql );
    {
       if( FALSE == Opeartion->Completed  ){
          Opeartion->CompletionEvent = &Event;
          Wait = TRUE;
       }
    }
    KeReleaseSpinLock( &Lock, OldIrql );

    while( Wait ){

       // allow a user to wake up the thread when terminating the
       // process, but note that the stack might be outswapped
       // when the thread is blocked waiting for the event
       WaitStatus = KeWaitForSingleObject( &Event,
                                           Executive,
                                           UserMode,
                                           FALSE,
                                           NULL );

       if( STATUS_SUCCESS != WaitStatus )
       {
          KeAcquireSpinlock( &Lock, &OldIrql );
          {
             // if NULL then go back to waiting as
             // there is a ongoing completion
             if( Opeartion->CompletionEvent ){
                Opeartion->CompletionEvent = NULL;
                Wait = FALSE;
             }
          }
          KeReleaseSpinLock( &Lock, OldIrql );

       }
}


NotifyOfCompletion()
{
    KeAcquireSpinlock( &Lock, &OldIrql );
    {
        Opeartion->Completed = TRUE;
        if( Opeartion->CompletionEvent  ){

           // the following call sometimes crashes the system
           // when tries to access an outswapped page
           KeSetEvent( Opeartion->CompletionEvent,                                      IO_NO_INCREMENT,
                       FALSE );
           Opeartion->CompletionEvent = NULL;
       }
    }
    KeReleaseSpinLock( &Lock, OldIrql );
}

 the reason to do this is when you want to be slightly more gentle and allow a user to terminate a waiting thread, this is a common scenario for distributed file systems where response time might be up to minutes.

    The problem with the above code is that a kernel stack can be swapped out while waiting in  KeWaitForSingleObject whith waiting mode set to UserMode . The call to KeSetEvent tries to access the event on the outswapped stack when the IRQL is DISPATCH_LEVEL and it has nothing to do with a call to KeAcquireSpinlock, the same will be even if you try to call KeSetEvent without raising IRQL as KeSetEvent elevates IRQL when working with the event.

   If you check the event address on an outswapped stack with WinDBG you see 

1: kd> !pte 0xaf792cac
                    VA af792cac
PDE at C0602BD8            PTE at C057BC90
contains 000000005402F863  contains 00000000A5129BE2
pfn 5402f     ---DA--KWEV  not valid
                            Transition: a5129

                            Protect: 1f - Outswapped kernel stack

The solution to the above example is to allocate the event from the NonPaged pool.

Monday, February 10, 2014

How IoCancelFileOpen works

WDK says

"IoCancelFileOpen sets the FO_FILE_OPEN_CANCELLED flag in the Flags member of the file object that FileObject points to. This flag indicates that the IRP_MJ_CREATE request has been canceled, and an IRP_MJ_CLOSE request will be issued for this file object."

But this does not tell the full story. First of all IoCancelFileOpen issues IRP_MJ_CLEANUP , then sets the FO_FILE_OPEN_CANCELLED  flag. Also, IoCancelFileOpen checks that no handles have been created for the file object, if this check fails the system will crash itself with KeBugCheck. Here you should say 

  ... Wait a minute! What about IRP_MJ_CLEANUP being sent? Should it be sent only for object with handles?

 The answer is NO. The system always sends IRP_MJ_CLEANUP for all file objects, if there were no handles created for a file object the IRP_MJ_CLEANUP  request is sent by IopDeleteFile  ( called by ObDereferenceObject ) before issuing  IRP_MJ_CLOSE. Here you must understand why IoCancelFileOpen does not send IRP_MJ_CLOSE , because it is sent by IopDeleteFile called by ObDereferenceObject .

Lets now change our focus on FO_FILE_OPEN_CANCELLED . What is this flag for? This flag is used by IoCreateFile when it decides how to reclaim  file object resources when an error is returned by IoCallDevice, if the flag is set then ObDereferenceObject or IopDeleteFile is called for the file object so the file system and attached filters will receive close request. If the flag is not set then the DeviceObject member of the file object is set to NULL so the close and cleanup request will not be sent when ObDereferenceObject  or  IopDeleteFile is called to reclaim the memory occupied by the file object, the latter is a case of an error returned by the lowest driver in the stack which is a file system driver.

Below is a call stack for create request processing when an attached filter called IoCancellFileOpen that resulted in sending close request from ObfDereferenceObject 

nt!IofCallDriver+0x3f
nt!IopDeleteFile+0xef
nt!ObpRemoveObjectRoutine+0x43
nt!ObfDereferenceObjectWithTag+0x5c
nt!ObfDereferenceObject+0xd
nt!IopParseDevice+0x167a
nt!ObpLookupObjectName+0x251
nt!ObOpenObjectByName+0xfe
nt!IopCreateFile+0x2a5
nt!IoCreateFileEx+0x88
nt!IoCreateFileSpecifyDeviceObjectHint+0x59

Saturday, February 8, 2014

Irp->UserBuffer mess

Have you ever asked yourself a question - should a driver check Irp->UserBufer using ProbeForRead/Write depending on the previous mode (UserMode or KernelMode)?
The answer is - not it should not as the IO Manager already did this in NtReadFile or NtWriteFile or NtDeviceIoControlFile. This means that it is legitimate to receive Irp->UserBuffer pointing to the kernel space when previous mode is UserMode, e.g. this happens when a third party driver issues an IRP or substitutes the buffer.

Some file system drivers wrongly use Irp->RequestorMode to decide on calling ProbeForRead/ProbeForWrite , for example rdpdr.sys does this, in that case the best tactic is changing Irp->RequestorMode to KernelMode if the buffer is changed to a buffer allocated in the system address space, but do not forget to change the buffer, MdlAddress and RequestorMode to original one in your completion routine or the system might BSOD while completing IRP .

P.S. Though I am talking here about buffers substituting for read and write requests I strongly discourage to use this technique to implement an FSD filter that transparently changes file data. This approach will not work in general. The only viable solution for this case is so called FSD over FSD implementation, I will discuss this approach in a future post.
 Another word of caution - you must use SEH ( try/except ) while accessing a buffer for an IRP with previous mode set to UserMode, this does not hurt if a buffer happens to be in a kernel space because of actions of an upper filter.

On importance of AccessMode for MmProbeAndLockPages

There is a one issue in using MmProbeAndLockPages that sometimes overlooked and resulted in a subtle and hard tracked bug. I am talking about AccessMode parameter. The value of this parameter translates to the dirty flag for page descriptors when MmUnlockPages is called. It might sound surprising but for the memory locked using MmProbeAndLockPages the kernel does not use dirty flag in the PTE to track modified pages, there are a lot of reasons for this, one is that the pages might never be mapped in any address space, e.g. when used for DMA IO. The kernel employs a simple approach - if the pages were locked with IoWriteAccess or IoModifyAccess the kernel marks page descriptors as dirty when MmUnlockPages is called even if there were no actual data transfer to the pages. The consequence of such behavior is that the Memory Manager will try to flush modified pages to a storage if they  belong to a mapped file, this might surprise both you and a file system driver, especially if the file system is read only one.

Wednesday, February 5, 2014

An old ARM clocking scheme

An interesting fact about the old ARM clocking scheme, ((c) ARM System on chip architecture )

Unlike the MU0 example presented in Section 1.3 on page 7, most ARMs do not
operate with edge-sensitive registers; instead the design is based around 2-phase
non-overlapping clocks, as shown in Figure 4.8, which are generated internally from
a single input clock signal. This scheme allows the use of level-sensitive transparent
latches. Data movement is controlled by passing the data alternately through latches
which are open during phase 1 and latches which are open during phase 2. The
non-overlapping property of the phase 1 and phase 2 clocks ensures that there are
no race conditions in the circuit.



Sunday, February 2, 2014

On Telegrapher's equations

I was always suspicious on telegrapher's equation derivation based on a circuit model

it was not convincing for me the way this model was derived, recently I stumbled across a derivation through Maxwell's equations http://cc.ee.ntu.edu.tw/~rbwu/course/EM2/Lec1_Telegrapher_Eq.pdf  , the two most important excerpts from the paper