Description
Calling XWPFDocument#setParagraph(XWPFParagraph paragraph, int pos) may cause
an inconsistency between the internal bodyElements and paragraphs lists.
After calling setParagraph, the element stored at the same position in
bodyElements and paragraphs may no longer refer to the same paragraph
instance.
Impact
This inconsistency breaks XWPFDocument#removeBodyElement(int pos).
removeBodyElement relies on getParagraphPos(int bodyPos) to locate the
corresponding paragraph index. When the paragraph was previously replaced via
setParagraph, getParagraphPos may return -1, causing
paragraphs.remove(paraPos) to fail with an exception.
Root Cause Analysis
In setParagraph, two different update mechanisms are used:
- The
paragraphs list is updated via ArrayList#set, directly replacing the
paragraph reference.
- The underlying XML (
CTDocument) is updated via
ctDocument.getBody().setPArray(...).
During XML processing, the generated XMLBeans code eventually calls
XObj.copy_contents_from, which copies the XML contents instead of
reusing the existing CTP / XWPFParagraph instance.
As a result, the paragraph object referenced by paragraphs differs from the
one created and stored in bodyElements, leading to inconsistent internal
state.
Steps to Reproduce
A sample DOCX file is attached.
public static void main(String[] args) throws IOException {
FileInputStream fis =
new FileInputStream("test_1989242873218412545.docx");
try (XWPFDocument document = new XWPFDocument(fis)) {
List<XWPFParagraph> paragraphs = document.getParagraphs();
document.setParagraph(paragraphs.get(5), 6);
// For debugging: inspect internal state after setParagraph
System.out.println("--");
}
}
Expected Behavior
After calling setParagraph, the internal bodyElements and paragraphs
collections should remain consistent, and subsequent calls to
removeBodyElement should work correctly.
Actual Behavior
bodyElements and paragraphs become inconsistent, causing
removeBodyElement to fail when removing a paragraph.
Additional Information
I have identified the cause and implemented a local fix.
A Pull Request will be submitted shortly.
Description
Calling
XWPFDocument#setParagraph(XWPFParagraph paragraph, int pos)may causean inconsistency between the internal
bodyElementsandparagraphslists.After calling
setParagraph, the element stored at the same position inbodyElementsandparagraphsmay no longer refer to the same paragraphinstance.
Impact
This inconsistency breaks
XWPFDocument#removeBodyElement(int pos).removeBodyElementrelies ongetParagraphPos(int bodyPos)to locate thecorresponding paragraph index. When the paragraph was previously replaced via
setParagraph,getParagraphPosmay return-1, causingparagraphs.remove(paraPos)to fail with an exception.Root Cause Analysis
In
setParagraph, two different update mechanisms are used:paragraphslist is updated viaArrayList#set, directly replacing theparagraph reference.
CTDocument) is updated viactDocument.getBody().setPArray(...).During XML processing, the generated XMLBeans code eventually calls
XObj.copy_contents_from, which copies the XML contents instead ofreusing the existing
CTP/XWPFParagraphinstance.As a result, the paragraph object referenced by
paragraphsdiffers from theone created and stored in
bodyElements, leading to inconsistent internalstate.
Steps to Reproduce
A sample DOCX file is attached.