Tim's PHP Scripts

Extract Text from a Word DOCX document

(docxtoxml)

This php script will extract all the text from a Word DOCX document.

PHP 5 or greater is needed. Will work on PHP 8.2 or later.

Features

This php class will take a DOCX type Word document and extract all the text from it. The text will include all list and paragraph numbering and also footnotes and endnotes together with their reference numbers. The text will outputted as an array, one array element per paragraph. This will make it easy to search or manipulate the text or to save it to a database. For convenience the first element [0] of the array contains the number of text array elements and the length of the longest element in the format 'Number:Length'. In normal mode the class produces no output to the screen.

The latest version of this script (v.1.0.2) can be downloaded from either:-

Github - https://github.com/timy352/docxtotext

PHP Classes - https://www.phpclasses.org/package/12274-PHP-Extract-text-from-Microsoft-Word-DOCX-files.html