PDF Text Editor API

Extract, edit, and update text content in PDF documents with powerful text editing capabilities while preserving layout and images.

Extract Content API
Extract text blocks and images from PDF documents for editing

The Extract Content API extracts text blocks and images from PDF documents with precise positioning information. This enables you to create a rich editing experience that maintains the original document layout and allows targeted text modifications.

Endpoint

POST https://api.mega-pdf.com/api/pdf/extract-text

Authentication

Authenticate requests using an API key in the x-api-key header.

// Header example
x-api-key: your-api-key

Request Parameters

The API accepts multipart/form-data requests with the following parameters:

ParameterTypeDescriptionRequired
fileFilePDF file to extract content from (max 50MB)Yes

Example Request

Extract content from a PDF using cURL:

curl -X POST https://api.mega-pdf.com/api/pdf/extract-text \
  -H "x-api-key: your-api-key" \
  -F "file=@/path/to/document.pdf"

Response Format

Successful responses include detailed text and image content with positioning:

{
  "success": true,
  "message": "Content extracted successfully from 3 pages with 125 text blocks and 4 images",
  "extractedData": {
    "pages": [
      {
        "page_number": 1,
        "width": 612,
        "height": 792,
        "texts": [
          {
            "text": "Sample document title",
            "x0": 100.5,
            "y0": 50.2,
            "x1": 400.8,
            "y1": 75.3,
            "font": "Helvetica-Bold",
            "size": 18.0,
            "color": 0
          },
          // More text blocks...
        ],
        "images": [
          {
            "x0": 50.0,
            "y0": 100.0,
            "x1": 250.0,
            "y1": 300.0,
            "width": 200.0,
            "height": 200.0,
            "image_data": "base64-encoded-image-data...",
            "format": "jpeg",
            "image_id": "session_id_page1_img0"
          },
          // More images...
        ]
      },
      // More pages...
    ],
    "metadata": {
      "total_pages": 3,
      "total_text_blocks": 125,
      "total_images": 4,
      "extraction_method": "PyMuPDF Enhanced with Images"
    }
  },
  "sessionId": "unique-session-identifier",
  "originalName": "document.pdf",
  "billing": {
    "usedFreeOperation": true,
    "freeOperationsRemaining": 9,
    "currentBalance": 10.50,
    "operationCost": 0.00
  }
}

Error responses:

{
  "success": false,
  "error": "No content found in the PDF. The PDF may be empty or password protected."
}

Data Structure

The response includes detailed information about each text block and image:

Text Block Properties

PropertyTypeDescription
textStringThe actual text content
x0, y0FloatTop-left corner coordinates
x1, y1FloatBottom-right corner coordinates
fontStringFont family name
sizeFloatFont size in points
colorIntegerRGB color value as an integer

Image Properties

PropertyTypeDescription
x0, y0FloatTop-left corner coordinates
x1, y1FloatBottom-right corner coordinates
width, heightFloatImage dimensions in points
image_dataStringBase64-encoded image data
formatStringImage format (jpeg, png, etc.)
image_idStringUnique identifier for the image

Code Examples

Using the Extract Content API with JavaScript:

const formData = new FormData();
formData.append('file', fs.createReadStream('document.pdf'));

fetch('https://api.mega-pdf.com/api/pdf/extract-text', {
  method: 'POST',
  headers: {
    'x-api-key': 'your-api-key'
  },
  body: formData
})
  .then(response => response.json())
  .then(data => {
    if (data.success) {
      console.log('Content extracted successfully');
      console.log('Total pages:', data.extractedData.metadata.total_pages);
      console.log('Total text blocks:', data.extractedData.metadata.total_text_blocks);
      console.log('Total images:', data.extractedData.metadata.total_images);
      
      // Store the session ID for later use when saving edits
      const sessionId = data.sessionId;
      
      // Process the extracted data
      data.extractedData.pages.forEach(page => {
        console.log(`Page ${page.page_number} has ${page.texts.length} text blocks and ${page.images?.length || 0} images`);
        
        // Access text blocks for editing
        page.texts.forEach(textBlock => {
          console.log(`Text: "${textBlock.text.substring(0, 50)}..."`);
          console.log(`Position: (${textBlock.x0}, ${textBlock.y0}) to (${textBlock.x1}, ${textBlock.y1})`);
          console.log(`Font: ${textBlock.font} at ${textBlock.size}pt`);
        });
      });
    } else {
      console.error('Failed to extract content:', data.error);
    }
  })
  .catch(error => console.error('Error:', error));