MegaPDF - Free PDF Converter, Editor, OCR & Unlock PDF

The Extract Content API extracts text blocks and images from PDF documents with precise positioning information. This enables you to create a rich editing experience that maintains the original document layout and allows targeted text modifications.

Endpoint

POST https://api.mega-pdf.com/api/pdf/extract-text

Authentication

Authenticate requests using an API key in the x-api-key header.

// Header example
x-api-key: your-api-key

Request Parameters

The API accepts multipart/form-data requests with the following parameters:

Parameter	Type	Description	Required
`file`	File	PDF file to extract content from (max 50MB)	Yes

Example Request

Extract content from a PDF using cURL:

curl -X POST https://api.mega-pdf.com/api/pdf/extract-text \
  -H "x-api-key: your-api-key" \
  -F "file=@/path/to/document.pdf"

Response Format

Successful responses include detailed text and image content with positioning:

{
  "success": true,
  "message": "Content extracted successfully from 3 pages with 125 text blocks and 4 images",
  "extractedData": {
    "pages": [
      {
        "page_number": 1,
        "width": 612,
        "height": 792,
        "texts": [
          {
            "text": "Sample document title",
            "x0": 100.5,
            "y0": 50.2,
            "x1": 400.8,
            "y1": 75.3,
            "font": "Helvetica-Bold",
            "size": 18.0,
            "color": 0
          },
          // More text blocks...
        ],
        "images": [
          {
            "x0": 50.0,
            "y0": 100.0,
            "x1": 250.0,
            "y1": 300.0,
            "width": 200.0,
            "height": 200.0,
            "image_data": "base64-encoded-image-data...",
            "format": "jpeg",
            "image_id": "session_id_page1_img0"
          },
          // More images...
        ]
      },
      // More pages...
    ],
    "metadata": {
      "total_pages": 3,
      "total_text_blocks": 125,
      "total_images": 4,
      "extraction_method": "PyMuPDF Enhanced with Images"
    }
  },
  "sessionId": "unique-session-identifier",
  "originalName": "document.pdf",
  "billing": {
    "usedFreeOperation": true,
    "freeOperationsRemaining": 9,
    "currentBalance": 10.50,
    "operationCost": 0.00
  }
}

Error responses:

{
  "success": false,
  "error": "No content found in the PDF. The PDF may be empty or password protected."
}

Data Structure

The response includes detailed information about each text block and image:

Text Block Properties

Property	Type	Description
`text`	String	The actual text content
`x0`, `y0`	Float	Top-left corner coordinates
`x1`, `y1`	Float	Bottom-right corner coordinates
`font`	String	Font family name
`size`	Float	Font size in points
`color`	Integer	RGB color value as an integer

Image Properties

Property	Type	Description
`x0`, `y0`	Float	Top-left corner coordinates
`x1`, `y1`	Float	Bottom-right corner coordinates
`width`, `height`	Float	Image dimensions in points
`image_data`	String	Base64-encoded image data
`format`	String	Image format (jpeg, png, etc.)
`image_id`	String	Unique identifier for the image

Code Examples

Using the Extract Content API with JavaScript:

const formData = new FormData();
formData.append('file', fs.createReadStream('document.pdf'));

fetch('https://api.mega-pdf.com/api/pdf/extract-text', {
  method: 'POST',
  headers: {
    'x-api-key': 'your-api-key'
  },
  body: formData
})
  .then(response => response.json())
  .then(data => {
    if (data.success) {
      console.log('Content extracted successfully');
      console.log('Total pages:', data.extractedData.metadata.total_pages);
      console.log('Total text blocks:', data.extractedData.metadata.total_text_blocks);
      console.log('Total images:', data.extractedData.metadata.total_images);
      
      // Store the session ID for later use when saving edits
      const sessionId = data.sessionId;
      
      // Process the extracted data
      data.extractedData.pages.forEach(page => {
        console.log(`Page ${page.page_number} has ${page.texts.length} text blocks and ${page.images?.length || 0} images`);
        
        // Access text blocks for editing
        page.texts.forEach(textBlock => {
          console.log(`Text: "${textBlock.text.substring(0, 50)}..."`);
          console.log(`Position: (${textBlock.x0}, ${textBlock.y0}) to (${textBlock.x1}, ${textBlock.y1})`);
          console.log(`Font: ${textBlock.font} at ${textBlock.size}pt`);
        });
      });
    } else {
      console.error('Failed to extract content:', data.error);
    }
  })
  .catch(error => console.error('Error:', error));

PDF Text Editor API