Blog

Fuzzing Image Parsing in Windows, Part Four: More HEIF

Dhanesh Kizhakkinan
Jul 05, 2022
7 min read

Continuing our discussion of image parsing vulnerabilities in the Windows HEIF codec, we take a look at analyzing a new crash, reconstructing function symbols, and the root cause analysis of the vulnerability, CVE-2022-24457. This vulnerability is present on a default install of Windows 10 and 11 and only requires browsing to a folder containing the malicious image file to trigger the vulnerability. The vulnerability is triggered when Windows attempts to automatically generate a thumbnail for the image. All vulnerabilities have been remediated by Microsoft following the disclosure by Mandiant.

The Crash - CVE-2022-24457

Figure 1: Crash details
Figure 1: Crash details

The crash in Figure 1 is an out of bound memory write within an AVX2 instruction; note that the crash function is considerably large and contains a notable series of AVX2 instructions. After a quick look around the decompilation, the operations, and internal calls to memcpy functions, we can deduce this as an AVX2 optimized version of memcpy. An out-of-bounds write inside the memcpy function is a good crash for further analysis.

Identifying other functions

With the crash function identified as memcpy, we next try to identify other functions in the binary. For this, we go one frame up the call stack and look at the decompilation in Figure 2. We can see that this decompliation contains a call to sub_18017dd88, which appears to be a logging function with the function name as its second parameter.

Figure 2: Decompilation showing debug logging call
Figure 2: Decompilation showing debug logging call

Some software ships with logging capabilities which can be helpful for analyzing crashes and performance issue in a production environment. In this case, the logging calls can help us to reconstruct multiple function names and understand the implemented functionality of these functions, which in turn helps us to easily determine the root cause of vulnerabilities. Looking at the cross references, we can see over 5000 calls to this logging function (see Figure 3).

Figure 3: Cross references to logging function
Figure 3:  Cross references to logging function

Given the large number of logging calls, we write a script which recovers the function names from the second argument and renames the caller functions.

The IDA Script

An IDAPython script was written to automate the process. It works according to the following algorithm.

  1. Get list of cross references (xrefs) to the logging function
  2. Get unique caller functions from the list of xrefs
  3. Decompile each caller
  4. Find the logging function call and retrieve the second argument as the function name
  5. Rename the caller function with the retrieved function name

I decided to write a generic script which can be reused in other projects. For that, I used IDA’s decompiler API to avoid processor and calling convention specific code. The script also makes use of our decompiler wrapper FIDL. The script is provided in Table 1.

Table 1: Function renamer IDAPython script

from idc import *
from idaapi import *
from idautils import *
import FIDL.decompiler_utils as du

# f_name: The logging function name
# indx: 0 based argument index to retrieve
def rename(f_name, indx):
    f_ea = get_name_ea_simple(f_name)
    if f_ea == BADADDR:
        print("Failed to resolve address for {}".format(f_name))
        return

    callers = set()
    # Get a set of unique callers
    for ref in XrefsTo(f_ea, True):
        if not ref.iscode:
            continue
        
        f = get_func(ref.frm)
        if f is None:
            continue
            
        f_ea = f.start_ea
        callers.add(f_ea)
    
    for caller_ea in callers:
        current_fname = get_func_name(caller_ea)
        # Rename only if the function name starts with sub_
        if current_fname.startswith("sub_"):
            c = du.find_all_calls_to_within(f_name, caller_ea) 
            try:
                # Validate the logging function and arguments
                if len(c) > 0 and len(c[0].args) > indx:
                    f_name_str = c[0].args[indx].val
                    set_name(caller_ea, "{}".format(f_name_str), SN_FORCE)
                else:
                    print("Failed in {}\n".format(current_fname))
            except:
                print("Exception in {}\n".format(current_fname))

rename('sub_18017DD88', 1)

With thousands of functions renamed in IDA, it gets considerably easier to do a full root-cause analysis. Even though we use IDA for static analysis, WinDBG + Time Travel Debugging (TTD) is regularly used for most of the dynamic analysis. We port our renamed symbols into WinDBG by using an IDA plugin: FakePDB. FakePDB creates a PDB from the IDA database, which can be loaded in WinDBG to enhance our debugging/tracing capabilities. An example is shown in Figure 4.

Figure 4: Ported symbols in WinDBG
Figure 4: Ported symbols in WinDBG

Root cause analysis of the bug

The relevant code from the function msheif_store!CHEIFStreamReader::ReadItemData is presented in Table 2.

Table 2: Vulnerable code in function ReadItemData

/*
    length calculation from CHEIFItemInfoEntry::GetDataSize
    0x309 + 0xee7        => 0x11f0
    0x11f0 + 0xfffff200    => 0x1000003f0
*/

QWORD currentOffset = 0;
QWORD length = 0x1000003f0; // CHEIFItemInfoEntry::GetDataSize
status = MFCreateMemoryBuffer(length, allocBuff);
if (status  < 0)
{
    // bail
}
while (1)
{
    ...
    // 0xee7 + 0x0 >= 0x1000003f0
    if (currentSize + currentOffset >= length)
    {
        // bail
    }
    // crash
    OptimizedMemcpy(currentOffset + allocBuff, srcBuff, currentSize);
    currentOffset += currentSize;
    ...
}

Astute readers will quickly point at the if condition for a possible integer overflow scenario. But in this case, such a scenario is mitigated while calculating the length in CHEIFItemInfoEntry::GetDataSize. The vulnerability is only visible when we look closely at the MFCreateMemoryBuffer function and its parameters. Figure 5 shows the function’s documentation from MSDN.

 Figure 5: MFCreateMemoryBuffer function documentation
 Figure 5: MFCreateMemoryBuffer function documentation

The MFCreateMemoryBuffer function accepts a 32-bit DWORD as the length parameter and returns an allocated buffer. But if we look at Table 2, we can see that the length parameter passed to the function is a 64-bit QWORD. In such an instance, the compiler decides to truncate the QWORD to DWORD. In this case, length 0x1000003f0 gets truncated to much smaller 0x3f0. This allocates a smaller buffer and larger data gets copied into the buffer, causing the out-of-bounds write.

From the function names CHEIFStreamReader::ReadItemData, we guess that the vulnerability occurs while trying to parse the item box. Further backtracing the calls, we see the lengths are read from the function CItemLocationAtom::ParseAtom, which points to the iloc box shown in Figure 6.

Figure 6: iloc box
Figure 6: iloc box

Looking at the box content, we can see all the three length values (0x309, 0xEE7 and 0xFFFFF200) specified in the box. Now we can look at the iloc specification to figure out the exact details for those lengths. HEIF is based on ISO Base Media File Format (ISOBMFF) but getting the right specification with box parsing algorithms tends to be complicated or paywalled.

Another approach we can try, is to look at open-source implementations of HEIF image parsers such as libheif or nokiatech-heif. Running our PoC file through decoding routines gives us the exact details of the lengths from the iloc box as shown in Figure 7.

Figure 7: iloc parsing in a debugger
Figure 7: iloc parsing in a debugger

The three lengths we see in our PoC file are called extent lengths. Microsoft’s HEIF implementation reads all the extent lengths and adds them together before the resulting length is used in allocating memory through the API function MFCreateMemoryBuffer. This API truncates the lengths to a DWORD and allocates a smaller buffer, causing the out-of-bounds write.

Patch

Microsoft patched this vulnerability in March 2022 by bailing out with an error if the total length is greater than 0xC8000000 (~3GiB).

Conclusion

Part four of this blog series presents a vulnerability in Microsoft’s HEIF decoder and shows how to reconstruct symbols to do a full root-cause analysis of the vulnerability. A list of latest reported vulnerabilities in HEIF codec can be found in the following appendix and found referenced in the Mandiant Vulnerability Disclosures.

Appendix

CVE id

Submitted Date

Fixed Date

Vulnerability type

CVE-2022-22007

22-April-2021

08-March-2022

Heap overflow

CVE-2022-21926

13-September-2021

08-February-2022

Heap overflow

CVE-2022-21917

17-September-2021

11-January-2022

Heap overflow

CVE-2022-21927

17-September-2021

08-February-2022

Heap overflow

CVE-2022-22006

17-September-2021

08-March-2022

Heap overflow

CVE-2022-21844

23-September-2021

08-February-2022

Heap overflow

CVE-2022-24453

19-October-2021

08-March-2022

Heap overflow

CVE-2022-24457

19-October-2021

08-March-2022

Heap overflow

CVE-2022-24456

14-November-2021

08-March-2022

Heap overflow

CVE-2022-24532

04-December-2021

12-April-2022

Heap overflow