Package FLOTF
Class PDFDimensionFinder
- java.lang.Object
-
- org.apache.pdfbox.contentstream.PDFStreamEngine
-
- org.apache.pdfbox.text.PDFTextStripper
-
- org.apache.pdfbox.text.PDFTextStripperByArea
-
- FLOTF.PDFDimensionFinder
-
public class PDFDimensionFinder extends org.apache.pdfbox.text.PDFTextStripperByArea
This class extends PDFTextStripperByArea- Author:
- Bernard Chan, Sonali Loomba
- See Also:
- https://pdfbox.apache.org/docs/2.0.1/javadocs/org/apache/pdfbox/text/PDFTextStripperByArea.html to determine the pixel locations of the rows on the table
-
-
Field Summary
Fields Modifier and Type Field Description private boolean
firstChar
Determines ifprocessTextPosition
is on the first characterprivate int
pageOffset
Number of pixels that the page is offset to the right from a left-leaning pageprivate java.util.ArrayList<java.lang.Integer>
rowYCoordinates
Stores the Y-Coordinate positions of the rows in the table
-
Constructor Summary
Constructors Constructor Description PDFDimensionFinder()
PDFDimensionFinder constructor
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected void
findOffset(org.apache.pdfbox.text.TextPosition pos)
Looks at a specific TextPosition character and determines thepageOffset
based on the position of the characterprotected void
findRowYCoordinates(org.apache.pdfbox.text.TextPosition pos)
Determines therowYCoordinates
based on the Y-Coordinate positions of the first column of entriesint
getPageOffset()
Gets the pageOffsetjava.util.ArrayList<java.lang.Integer>
getRowYCoordinates()
Gets the rowYCoordinatesprotected void
processTextPosition(org.apache.pdfbox.text.TextPosition pos)
Processes all the characters in the PDF character by character, determining the page's lean based on the first character and also looking at all the characters to determine what Y-Coordinates to add torowYCoordinates
protected void
showGlyph(org.apache.pdfbox.util.Matrix arg0, org.apache.pdfbox.pdmodel.font.PDFont arg1, int arg2, org.apache.pdfbox.util.Vector arg3)
-
Methods inherited from class org.apache.pdfbox.text.PDFTextStripperByArea
addRegion, extractRegions, getRegions, getTextForRegion, removeRegion, setShouldSeparateByBeads, writePage
-
Methods inherited from class org.apache.pdfbox.text.PDFTextStripper
endArticle, endDocument, endPage, getAddMoreFormatting, getArticleEnd, getArticleStart, getAverageCharTolerance, getCharactersByArticle, getCurrentPageNo, getDropThreshold, getEndBookmark, getEndPage, getIndentThreshold, getLineSeparator, getListItemPatterns, getOutput, getPageEnd, getPageStart, getParagraphEnd, getParagraphStart, getSeparateByBeads, getSortByPosition, getSpacingTolerance, getStartBookmark, getStartPage, getSuppressDuplicateOverlappingText, getText, getWordSeparator, matchPattern, processPage, processPages, setAddMoreFormatting, setArticleEnd, setArticleStart, setAverageCharTolerance, setDropThreshold, setEndBookmark, setEndPage, setIndentThreshold, setLineSeparator, setListItemPatterns, setPageEnd, setPageStart, setParagraphEnd, setParagraphStart, setSortByPosition, setSpacingTolerance, setStartBookmark, setStartPage, setSuppressDuplicateOverlappingText, setWordSeparator, startArticle, startArticle, startDocument, startPage, writeCharacters, writeLineSeparator, writePageEnd, writePageStart, writeParagraphEnd, writeParagraphSeparator, writeParagraphStart, writeString, writeString, writeText, writeWordSeparator
-
Methods inherited from class org.apache.pdfbox.contentstream.PDFStreamEngine
addOperator, applyTextAdjustment, beginMarkedContentSequence, beginText, decreaseLevel, endMarkedContentSequence, endText, getAppearance, getCurrentPage, getGraphicsStackSize, getGraphicsState, getInitialMatrix, getLevel, getResources, getTextLineMatrix, getTextMatrix, increaseLevel, operatorException, processAnnotation, processChildStream, processOperator, processOperator, processSoftMask, processTilingPattern, processTilingPattern, processTransparencyGroup, processType3Stream, registerOperatorProcessor, restoreGraphicsStack, restoreGraphicsState, saveGraphicsStack, saveGraphicsState, setLineDashPattern, setTextLineMatrix, setTextMatrix, showAnnotation, showFontGlyph, showFontGlyph, showForm, showGlyph, showText, showTextString, showTextStrings, showTransparencyGroup, showType3Glyph, showType3Glyph, transformedPoint, transformWidth, unsupportedOperator
-
-
-
-
Field Detail
-
rowYCoordinates
private java.util.ArrayList<java.lang.Integer> rowYCoordinates
Stores the Y-Coordinate positions of the rows in the table
-
firstChar
private boolean firstChar
Determines ifprocessTextPosition
is on the first character
-
pageOffset
private int pageOffset
Number of pixels that the page is offset to the right from a left-leaning page
-
-
Method Detail
-
processTextPosition
protected void processTextPosition(org.apache.pdfbox.text.TextPosition pos)
Processes all the characters in the PDF character by character, determining the page's lean based on the first character and also looking at all the characters to determine what Y-Coordinates to add torowYCoordinates
- Overrides:
processTextPosition
in classorg.apache.pdfbox.text.PDFTextStripperByArea
- Parameters:
pos
- the TextPosition object representing the current character
-
findOffset
protected void findOffset(org.apache.pdfbox.text.TextPosition pos)
Looks at a specific TextPosition character and determines thepageOffset
based on the position of the character- Parameters:
pos
- the TextPosition object representing the current character
-
findRowYCoordinates
protected void findRowYCoordinates(org.apache.pdfbox.text.TextPosition pos)
Determines therowYCoordinates
based on the Y-Coordinate positions of the first column of entries- Parameters:
pos
- the TextPosition object representing the current character
-
getRowYCoordinates
public java.util.ArrayList<java.lang.Integer> getRowYCoordinates()
Gets the rowYCoordinates- Returns:
- an ArrayList specifying the Y-Coordinates of the rows in the table
-
getPageOffset
public int getPageOffset()
Gets the pageOffset- Returns:
- an Integer specifying the pageOffset
-
showGlyph
protected void showGlyph(org.apache.pdfbox.util.Matrix arg0, org.apache.pdfbox.pdmodel.font.PDFont arg1, int arg2, org.apache.pdfbox.util.Vector arg3) throws java.io.IOException
- Overrides:
showGlyph
in classorg.apache.pdfbox.contentstream.PDFStreamEngine
- Throws:
java.io.IOException
-
-