Cerbero Engine

What is Cerbero Engine?

Cerbero Engine is our premier solution tailored for enterprise endeavors, including cloud-based and in-house services. Boasting the same SDK as Cerbero Suite, it has successfully analyzed billions of files to date.

Security

Cerbero Engine is meticulously engineered to address a comprehensive range of security concerns when analyzing malicious files, including buffer overflows, integer overflows, infinite loops, infinite recursion, decompression bombs, and denial-of-service attacks, among others.

Cross-Platform

Mirroring the versatility of Cerbero Suite, Cerbero Engine is cross-platform. It’s available for both Windows (x86, x64) and Linux (x64) and maintains compatibility with earlier versions of both Windows and Linux.

Documentation

Our SDK comes with comprehensive documentation. Beyond covering the API, it delves into essential concepts and is enriched with numerous code examples.

Embedding

Cerbero Engine is designed for seamless integration: it functions as a Dynamic-Link Library (DLL) on Windows and a Shared Library on Linux. You can embed the engine using either C/C++ or Python 3.

Interacting with the engine from Python is straightforward:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
from ProEngine import *
# initialize the engine
proEngineInit()
# from now on the SDK can be accessed
from Pro.PDF import *
# ...
# finalize the engine before exiting
proEngineFinal()
from ProEngine import * # initialize the engine proEngineInit() # from now on the SDK can be accessed from Pro.PDF import * # ... # finalize the engine before exiting proEngineFinal()
from ProEngine import *

# initialize the engine
proEngineInit()

# from now on the SDK can be accessed
from Pro.PDF import *
# ...

# finalize the engine before exiting
proEngineFinal()

Integrating the engine using C/C++ is equally straightforward: just include the ProEngine header and specify the engine’s disk location:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
#define PRO_ENGINE_INIT
#include "ProEngine.h"
int main()
{
/* initialize the engine */
if (!proEngineInit("/path/to/the/engine", ProEngine_InitPython))
return -1;
/* from now on the SDK can be accessed */
/* finalize the engine before exiting */
proEngineFinal();
return 0;
}
#define PRO_ENGINE_INIT #include "ProEngine.h" int main() { /* initialize the engine */ if (!proEngineInit("/path/to/the/engine", ProEngine_InitPython)) return -1; /* from now on the SDK can be accessed */ /* finalize the engine before exiting */ proEngineFinal(); return 0; }
#define PRO_ENGINE_INIT
#include "ProEngine.h"

int main()
{
    /* initialize the engine */
    if (!proEngineInit("/path/to/the/engine", ProEngine_InitPython))
        return -1;

    /* from now on the SDK can be accessed */

    /* finalize the engine before exiting */
    proEngineFinal();
    return 0;
}

Simplicity

Our SDK strikes a balance between intuitiveness and adaptability. Take, for instance, extracting JavaScript code from a PDF document; it’s as simple as crafting a hook extension. The code snippet below will handle the task, regardless of whether the PDF is encrypted or nested within an archive.

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
from Pro.Core import *
def printJSEntry(sp, xml, tnode):
# data node
dnode = xml.findChild(tnode, "d")
if not dnode:
return
# we let the scan engine extract the JavaScript for us
params = NTStringVariantHash()
params.insert("op", "js")
idnode = xml.findChild(dnode, "id")
if idnode:
params.insert("id", int(xml.value(idnode), 16))
ridnode = xml.findChild(dnode, "rid")
if idnode:
params.insert("rid", int(xml.value(ridnode), 16))
js = sp.customOperation(params)
# print out the JavaScript
print("JS CODE")
print("-------")
print(js)
def pdfExtractJS(sp, ud):
xml = sp.getReportXML()
# object node
onode = xml.findChild(None, "o")
if onode:
# scan node
snode = xml.findChild(onode, "s")
if snode:
# enumerate scan entries
tchild = xml.firstChild(snode)
while tchild:
if xml.name(tchild) == "t":
# type attribute
tattr = xml.findAttribute(tchild, "t")
# check if it's a JavaScript entry
if tattr and int(xml.value(tattr)) == CT_JavaScript:
printJSEntry(sp, xml, tchild)
tchild = xml.nextSibling(tchild)
from Pro.Core import * def printJSEntry(sp, xml, tnode): # data node dnode = xml.findChild(tnode, "d") if not dnode: return # we let the scan engine extract the JavaScript for us params = NTStringVariantHash() params.insert("op", "js") idnode = xml.findChild(dnode, "id") if idnode: params.insert("id", int(xml.value(idnode), 16)) ridnode = xml.findChild(dnode, "rid") if idnode: params.insert("rid", int(xml.value(ridnode), 16)) js = sp.customOperation(params) # print out the JavaScript print("JS CODE") print("-------") print(js) def pdfExtractJS(sp, ud): xml = sp.getReportXML() # object node onode = xml.findChild(None, "o") if onode: # scan node snode = xml.findChild(onode, "s") if snode: # enumerate scan entries tchild = xml.firstChild(snode) while tchild: if xml.name(tchild) == "t": # type attribute tattr = xml.findAttribute(tchild, "t") # check if it's a JavaScript entry if tattr and int(xml.value(tattr)) == CT_JavaScript: printJSEntry(sp, xml, tchild) tchild = xml.nextSibling(tchild)
from Pro.Core import *

def printJSEntry(sp, xml, tnode):
    # data node
    dnode = xml.findChild(tnode, "d")
    if not dnode:
        return
    # we let the scan engine extract the JavaScript for us
    params = NTStringVariantHash()
    params.insert("op", "js")
    idnode = xml.findChild(dnode, "id")
    if idnode:
        params.insert("id", int(xml.value(idnode), 16))
    ridnode = xml.findChild(dnode, "rid")
    if idnode:
        params.insert("rid", int(xml.value(ridnode), 16))
    js = sp.customOperation(params)
    # print out the JavaScript
    print("JS CODE")
    print("-------")
    print(js)

def pdfExtractJS(sp, ud):
    xml = sp.getReportXML()
    # object node
    onode = xml.findChild(None, "o")
    if onode:
        # scan node
        snode = xml.findChild(onode, "s")
        if snode:
            # enumerate scan entries
            tchild = xml.firstChild(snode)
            while tchild:
                if xml.name(tchild) == "t":
                    # type attribute
                    tattr = xml.findAttribute(tchild, "t")
                    # check if it's a JavaScript entry
                    if tattr and int(xml.value(tattr)) == CT_JavaScript:
                        printJSEntry(sp, xml, tchild)
                tchild = xml.nextSibling(tchild)

Leveraging hooks and other extensions allows the scanning engine to shoulder the bulk of the workload.

Alternatively, the SDK can be used for custom parsing operations without relying on the scanning engine. For instance, the provided code snippet parses a PDF document, printing the ID, the dictionary, and the decoded stream content for every PDF object. The code even contains logic to detect unreferenced objects, which can be found in corrupted or malicious PDF documents, and decrypts encrypted PDF documents.

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
# iterate through all the objects in a PDF
from ProEngine import *
def parsePDF(fname):
# open the file
c = createContainerFromFile(fname)
if c.isNull():
print("error: couldn't open file")
return
# load the file as PDF
pdf = PDFObject()
if not pdf.Load(c):
print("error: invalid file format")
return
# parse all referenced objects
objtable = pdf.BuildObjectTable()
# detect unreferenced objects
# (corrupted or malicious PDFs may contain them)
pdf.DetectObjects(objtable)
# store the object table internally
pdf.SetObjectTable(objtable)
# process PDF encryption
if not pdf.ProcessEncryption():
print("warning: couldn't decrypt file")
# [optional] sort objects by ID
oids = []
it = objtable.iterator()
while it.hasNext():
oid, _ = it.next()
oids.append(oid)
oids.sort()
# iterate through the objects
for oid in oids:
# print out the object id
print("\nOBJECT ID:", oid >> 32, "\n")
# parse the object
ret, dictn, content, info = pdf.ParseObject(objtable, oid)
if not ret:
print("warning: couldn't parse object %d" % (oid,))
continue
# print out the object dictionary
it = dictn.iterator()
while it.hasNext():
k, v = it.next()
print(" ", k, "-", v)
# print out the decoded object stream
content = pdf.DecodeObjectStream(content, dictn, oid)
if not content:
continue
out = NTTextBuffer()
out.printHex(content)
print("\n", out.buffer)
if __name__ == "__main__":
import sys
proEngineInit()
from Pro.Core import *
from Pro.PDF import *
parsePDF(sys.argv[1])
proEngineFinal()
# iterate through all the objects in a PDF from ProEngine import * def parsePDF(fname): # open the file c = createContainerFromFile(fname) if c.isNull(): print("error: couldn't open file") return # load the file as PDF pdf = PDFObject() if not pdf.Load(c): print("error: invalid file format") return # parse all referenced objects objtable = pdf.BuildObjectTable() # detect unreferenced objects # (corrupted or malicious PDFs may contain them) pdf.DetectObjects(objtable) # store the object table internally pdf.SetObjectTable(objtable) # process PDF encryption if not pdf.ProcessEncryption(): print("warning: couldn't decrypt file") # [optional] sort objects by ID oids = [] it = objtable.iterator() while it.hasNext(): oid, _ = it.next() oids.append(oid) oids.sort() # iterate through the objects for oid in oids: # print out the object id print("\nOBJECT ID:", oid >> 32, "\n") # parse the object ret, dictn, content, info = pdf.ParseObject(objtable, oid) if not ret: print("warning: couldn't parse object %d" % (oid,)) continue # print out the object dictionary it = dictn.iterator() while it.hasNext(): k, v = it.next() print(" ", k, "-", v) # print out the decoded object stream content = pdf.DecodeObjectStream(content, dictn, oid) if not content: continue out = NTTextBuffer() out.printHex(content) print("\n", out.buffer) if __name__ == "__main__": import sys proEngineInit() from Pro.Core import * from Pro.PDF import * parsePDF(sys.argv[1]) proEngineFinal()
# iterate through all the objects in a PDF
from ProEngine import *

def parsePDF(fname):
    # open the file
    c = createContainerFromFile(fname)
    if c.isNull():
        print("error: couldn't open file")
        return
    # load the file as PDF
    pdf = PDFObject()
    if not pdf.Load(c):
        print("error: invalid file format")
        return
    # parse all referenced objects
    objtable = pdf.BuildObjectTable()
    # detect unreferenced objects 
    # (corrupted or malicious PDFs may contain them)
    pdf.DetectObjects(objtable)
    # store the object table internally
    pdf.SetObjectTable(objtable)
    # process PDF encryption
    if not pdf.ProcessEncryption():
        print("warning: couldn't decrypt file")
    # [optional] sort objects by ID
    oids = []
    it = objtable.iterator()
    while it.hasNext():
        oid, _ = it.next()
        oids.append(oid)
    oids.sort()
    # iterate through the objects
    for oid in oids:
        # print out the object id
        print("\nOBJECT ID:", oid >> 32, "\n")
        # parse the object
        ret, dictn, content, info = pdf.ParseObject(objtable, oid)
        if not ret:
            print("warning: couldn't parse object %d" % (oid,))
            continue
        # print out the object dictionary
        it = dictn.iterator()
        while it.hasNext():
            k, v = it.next()
            print("   ", k, "-", v)
        # print out the decoded object stream
        content = pdf.DecodeObjectStream(content, dictn, oid)
        if not content:
            continue
        out = NTTextBuffer()
        out.printHex(content)
        print("\n", out.buffer)
            
if __name__ == "__main__":
    import sys
    proEngineInit()
    from Pro.Core import *
    from Pro.PDF import *
    parsePDF(sys.argv[1])
    proEngineFinal()

This is the output for a single PDF object:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
OBJECT ID: 50
/Length - 371
/Filter - /FlateDecode
0 1 2 3 4 5 6 7 8 9 A B C D E F Ascii
0000 2F 43 49 44 49 6E 69 74 2F 50 72 6F 63 53 65 74 /CIDInit/ProcSet
0010 20 66 69 6E 64 72 65 73 6F 75 72 63 65 20 62 65 findresource be
0020 67 69 6E 0A 31 32 20 64 69 63 74 20 62 65 67 69 gin.12 dict begi
0030 6E 0A 62 65 67 69 6E 63 6D 61 70 0A 2F 43 49 44 n.begincmap./CID
0040 53 79 73 74 65 6D 49 6E 66 6F 3C 3C 0A 2F 52 65 SystemInfo<<./Re
0050 67 69 73 74 72 79 20 28 41 64 6F 62 65 29 0A 2F gistry (Adobe)./
0060 4F 72 64 65 72 69 6E 67 20 28 55 43 53 29 0A 2F Ordering (UCS)./
0070 53 75 70 70 6C 65 6D 65 6E 74 20 30 0A 3E 3E 20 Supplement 0.>>
...
OBJECT ID: 50 /Length - 371 /Filter - /FlateDecode 0 1 2 3 4 5 6 7 8 9 A B C D E F Ascii 0000 2F 43 49 44 49 6E 69 74 2F 50 72 6F 63 53 65 74 /CIDInit/ProcSet 0010 20 66 69 6E 64 72 65 73 6F 75 72 63 65 20 62 65 findresource be 0020 67 69 6E 0A 31 32 20 64 69 63 74 20 62 65 67 69 gin.12 dict begi 0030 6E 0A 62 65 67 69 6E 63 6D 61 70 0A 2F 43 49 44 n.begincmap./CID 0040 53 79 73 74 65 6D 49 6E 66 6F 3C 3C 0A 2F 52 65 SystemInfo<<./Re 0050 67 69 73 74 72 79 20 28 41 64 6F 62 65 29 0A 2F gistry (Adobe)./ 0060 4F 72 64 65 72 69 6E 67 20 28 55 43 53 29 0A 2F Ordering (UCS)./ 0070 53 75 70 70 6C 65 6D 65 6E 74 20 30 0A 3E 3E 20 Supplement 0.>> ...
OBJECT ID: 50 

    /Length - 371
    /Filter - /FlateDecode

         0  1  2  3  4  5  6  7   8  9  A  B  C  D  E  F    Ascii

0000   2F 43 49 44 49 6E 69 74  2F 50 72 6F 63 53 65 74    /CIDInit/ProcSet
0010   20 66 69 6E 64 72 65 73  6F 75 72 63 65 20 62 65     findresource be
0020   67 69 6E 0A 31 32 20 64  69 63 74 20 62 65 67 69    gin.12 dict begi
0030   6E 0A 62 65 67 69 6E 63  6D 61 70 0A 2F 43 49 44    n.begincmap./CID
0040   53 79 73 74 65 6D 49 6E  66 6F 3C 3C 0A 2F 52 65    SystemInfo<<./Re
0050   67 69 73 74 72 79 20 28  41 64 6F 62 65 29 0A 2F    gistry (Adobe)./
0060   4F 72 64 65 72 69 6E 67  20 28 55 43 53 29 0A 2F    Ordering (UCS)./
0070   53 75 70 70 6C 65 6D 65  6E 74 20 30 0A 3E 3E 20    Supplement 0.>> 
...

The code snippet below recursively scans all files within a given directory.

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
import os
from ProEngine import *
def scanFiles(scandir):
rootdir = os.path.realpath(scandir)
for dirname, _, filelist in os.walk(rootdir):
print("Scanning directory: %s" % dirname)
for fname in filelist:
fullpath = os.path.join(dirname, fname)
print('\tScanning %s... ' % fname, end ="")
r = proEngineScanFile(fullpath, return_report=True)
print("OK" if r else "FAIL")
if __name__ == "__main__":
proEngineInit()
scanFiles("/path/to/scan")
proEngineFinal()
import os from ProEngine import * def scanFiles(scandir): rootdir = os.path.realpath(scandir) for dirname, _, filelist in os.walk(rootdir): print("Scanning directory: %s" % dirname) for fname in filelist: fullpath = os.path.join(dirname, fname) print('\tScanning %s... ' % fname, end ="") r = proEngineScanFile(fullpath, return_report=True) print("OK" if r else "FAIL") if __name__ == "__main__": proEngineInit() scanFiles("/path/to/scan") proEngineFinal()
import os
from ProEngine import *

def scanFiles(scandir):
    rootdir = os.path.realpath(scandir)
    for dirname, _, filelist in os.walk(rootdir):
        print("Scanning directory: %s" % dirname)
        for fname in filelist:
            fullpath = os.path.join(dirname, fname)
            print('\tScanning %s... ' % fname, end ="")
            r = proEngineScanFile(fullpath, return_report=True)
            print("OK" if r else "FAIL")
            
if __name__ == "__main__":
    proEngineInit()
    scanFiles("/path/to/scan")
    proEngineFinal()

In C/C++, you can execute Python scripts, evaluate Python code, access Python variables, and scan files. The following code snippet demonstrates setting a global variable in Python and then fetching it in C.

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
#define PRO_ENGINE_INIT
#include "ProEngine.h"
int main()
{
/* initialize the engine */
if (!proEngineInit("/path/to/the/engine", ProEngine_InitPython))
return -1;
/* set a variable in Python and retrieve it from C */
proEngineEvalPythonCode("some_value ='kawaii'");
char *r = proEngineGetPythonGlobalVariable("some_value");
if (r != NULL && strcmp(r, "kawaii") == 0)
puts("SUCCESS");
else
puts("FAIL");
proEngineFreeMemory(r);
/* finalize the engine before exiting */
proEngineFinal();
return 0;
}
#define PRO_ENGINE_INIT #include "ProEngine.h" int main() { /* initialize the engine */ if (!proEngineInit("/path/to/the/engine", ProEngine_InitPython)) return -1; /* set a variable in Python and retrieve it from C */ proEngineEvalPythonCode("some_value ='kawaii'"); char *r = proEngineGetPythonGlobalVariable("some_value"); if (r != NULL && strcmp(r, "kawaii") == 0) puts("SUCCESS"); else puts("FAIL"); proEngineFreeMemory(r); /* finalize the engine before exiting */ proEngineFinal(); return 0; }
#define PRO_ENGINE_INIT
#include "ProEngine.h"

int main()
{
    /* initialize the engine */
    if (!proEngineInit("/path/to/the/engine", ProEngine_InitPython))
        return -1;

    /* set a variable in Python and retrieve it from C */
    proEngineEvalPythonCode("some_value ='kawaii'");

    char *r = proEngineGetPythonGlobalVariable("some_value");

    if (r != NULL && strcmp(r, "kawaii") == 0)
        puts("SUCCESS");
    else
        puts("FAIL");

    proEngineFreeMemory(r);

    /* finalize the engine before exiting */
    proEngineFinal();
    return 0;
}

The SDK makes it trivial to exchange variables between Python and C/C++.

Speed

The SDK is crafted in Python, but the core engine is written in C++ and supports both multi-threading and multi-processing. This architecture ensures optimal speed and also empowers you to create cross-platform code compatible with both Cerbero Engine and Cerbero Suite.

Stability

With the SDK based in Python, there’s no stress about rebuilding your project with each engine update. Furthermore, we are committed to ensuring continuity; we conscientiously avoid introducing breaking changes to the SDK, so you can be confident that updates won’t disrupt your existing code.

Range

The SDK offers a comprehensive toolkit, covering support for a myriad of file formats, scanning, disassembly, decompilation, emulation, signature matching, file carving, decompression, decryption, and beyond.

Unsure if Cerbero Engine aligns with your needs? Feel free to reach out for a consultation.

Quality

We ensure that Cerbero Engine stays ahead of the curve, adeptly handling the latest threats and challenges posed by complex file formats. We offer state-of-the-art support, especially for challenging file types like Adobe PDF and Microsoft Office.

Flexible Licensing

We offer licensing for Cerbero Engine tailored to each specific case, with terms based on the project’s scope. For a personalized quotation, please reach out to us.

Suite Discounts

Acquiring a Cerbero Engine license also entitles your organization to discounted lab licenses of Cerbero Suite. Cerbero Suite empowers your engineers to interactively troubleshoot parsing challenges, delve into edge cases, utilize the Python editor for development, and craft graphical applications that seamlessly integrate with Cerbero Engine.

Priority Support

We recognize the crucial nature of enterprise applications. Thus, Cerbero Engine customers benefit from our priority support.