Skip to main content

Domtree Definition

In RAG (Retrieval-Augmented Generation) systems, high-quality document parsing is a key foundation for ensuring accuracy and efficiency of downstream tasks. As a core component of the document parsing module, the DomTree protocol transforms complex heterogeneous raw documents into programmable and reasoning-capable tree-structured logical structures by structurally representing the hierarchical and semantic relationships of documents.

Structure Definition

Field NameField DescriptionData Type
rootRoot nodeNode
source_fileDocument sourceobject
idFile IDstring
nameFile namestring
typee.g.: pdfstring
mime_typee.g.: application/pdfstring
versionFile version numbernumber
summarySummarystring
tokensEstimated token countnumber
pathHierarchical information with numbering, e.g.: [1,2,1]array[number]
elementElement informationElement
typeOne of the following: ["Text","Title","List","Catalog","Table","Figure","Formula","Code","ListItem"]string
positionsPosition information, may span pages so it's an arrayarray[Position]
bboxRectangle coordinate information in document, e.g.: [90.1,263.8,101.8,274.3]array[double]
pagePage numberinteger
nameName if type is Table or Figurestring
descriptionDescription if type is Table or Figurestring
textText information, OCR text from imagesstring
imageImage informationimage
typeCan be image_url, image_base64, image_filestring
urlLink addressstring
base64Base64 encoded imagestring
file_idFile ID uploaded to file-apiString
rowsTable-specific attribute, table rowsarray[Cell]
cellsCell attributesCell
pathCell position in table: start row, end row, start column, end columnarray[number]
textTextstring
nodesUsed for complex cell elements, path numbering starts from beginning within nodearray[Node]
childrenChild node informationarray[Node]