Package FLOTF
Class PDFDimensionFinder
- java.lang.Object
-
- org.apache.pdfbox.contentstream.PDFStreamEngine
-
- org.apache.pdfbox.text.PDFTextStripper
-
- org.apache.pdfbox.text.PDFTextStripperByArea
-
- FLOTF.PDFDimensionFinder
-
public class PDFDimensionFinder extends org.apache.pdfbox.text.PDFTextStripperByAreaThis class extends PDFTextStripperByArea- Author:
- Bernard Chan, Sonali Loomba
- See Also:
- https://pdfbox.apache.org/docs/2.0.1/javadocs/org/apache/pdfbox/text/PDFTextStripperByArea.html to determine the pixel locations of the rows on the table
-
-
Field Summary
Fields Modifier and Type Field Description private booleanfirstCharDetermines ifprocessTextPositionis on the first characterprivate intpageOffsetNumber of pixels that the page is offset to the right from a left-leaning pageprivate java.util.ArrayList<java.lang.Integer>rowYCoordinatesStores the Y-Coordinate positions of the rows in the table
-
Constructor Summary
Constructors Constructor Description PDFDimensionFinder()PDFDimensionFinder constructor
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected voidfindOffset(org.apache.pdfbox.text.TextPosition pos)Looks at a specific TextPosition character and determines thepageOffsetbased on the position of the characterprotected voidfindRowYCoordinates(org.apache.pdfbox.text.TextPosition pos)Determines therowYCoordinatesbased on the Y-Coordinate positions of the first column of entriesintgetPageOffset()Gets the pageOffsetjava.util.ArrayList<java.lang.Integer>getRowYCoordinates()Gets the rowYCoordinatesprotected voidprocessTextPosition(org.apache.pdfbox.text.TextPosition pos)Processes all the characters in the PDF character by character, determining the page's lean based on the first character and also looking at all the characters to determine what Y-Coordinates to add torowYCoordinatesprotected voidshowGlyph(org.apache.pdfbox.util.Matrix arg0, org.apache.pdfbox.pdmodel.font.PDFont arg1, int arg2, org.apache.pdfbox.util.Vector arg3)-
Methods inherited from class org.apache.pdfbox.text.PDFTextStripperByArea
addRegion, extractRegions, getRegions, getTextForRegion, removeRegion, setShouldSeparateByBeads, writePage
-
Methods inherited from class org.apache.pdfbox.text.PDFTextStripper
endArticle, endDocument, endPage, getAddMoreFormatting, getArticleEnd, getArticleStart, getAverageCharTolerance, getCharactersByArticle, getCurrentPageNo, getDropThreshold, getEndBookmark, getEndPage, getIndentThreshold, getLineSeparator, getListItemPatterns, getOutput, getPageEnd, getPageStart, getParagraphEnd, getParagraphStart, getSeparateByBeads, getSortByPosition, getSpacingTolerance, getStartBookmark, getStartPage, getSuppressDuplicateOverlappingText, getText, getWordSeparator, matchPattern, processPage, processPages, setAddMoreFormatting, setArticleEnd, setArticleStart, setAverageCharTolerance, setDropThreshold, setEndBookmark, setEndPage, setIndentThreshold, setLineSeparator, setListItemPatterns, setPageEnd, setPageStart, setParagraphEnd, setParagraphStart, setSortByPosition, setSpacingTolerance, setStartBookmark, setStartPage, setSuppressDuplicateOverlappingText, setWordSeparator, startArticle, startArticle, startDocument, startPage, writeCharacters, writeLineSeparator, writePageEnd, writePageStart, writeParagraphEnd, writeParagraphSeparator, writeParagraphStart, writeString, writeString, writeText, writeWordSeparator
-
Methods inherited from class org.apache.pdfbox.contentstream.PDFStreamEngine
addOperator, applyTextAdjustment, beginMarkedContentSequence, beginText, decreaseLevel, endMarkedContentSequence, endText, getAppearance, getCurrentPage, getGraphicsStackSize, getGraphicsState, getInitialMatrix, getLevel, getResources, getTextLineMatrix, getTextMatrix, increaseLevel, operatorException, processAnnotation, processChildStream, processOperator, processOperator, processSoftMask, processTilingPattern, processTilingPattern, processTransparencyGroup, processType3Stream, registerOperatorProcessor, restoreGraphicsStack, restoreGraphicsState, saveGraphicsStack, saveGraphicsState, setLineDashPattern, setTextLineMatrix, setTextMatrix, showAnnotation, showFontGlyph, showFontGlyph, showForm, showGlyph, showText, showTextString, showTextStrings, showTransparencyGroup, showType3Glyph, showType3Glyph, transformedPoint, transformWidth, unsupportedOperator
-
-
-
-
Field Detail
-
rowYCoordinates
private java.util.ArrayList<java.lang.Integer> rowYCoordinates
Stores the Y-Coordinate positions of the rows in the table
-
firstChar
private boolean firstChar
Determines ifprocessTextPositionis on the first character
-
pageOffset
private int pageOffset
Number of pixels that the page is offset to the right from a left-leaning page
-
-
Method Detail
-
processTextPosition
protected void processTextPosition(org.apache.pdfbox.text.TextPosition pos)
Processes all the characters in the PDF character by character, determining the page's lean based on the first character and also looking at all the characters to determine what Y-Coordinates to add torowYCoordinates- Overrides:
processTextPositionin classorg.apache.pdfbox.text.PDFTextStripperByArea- Parameters:
pos- the TextPosition object representing the current character
-
findOffset
protected void findOffset(org.apache.pdfbox.text.TextPosition pos)
Looks at a specific TextPosition character and determines thepageOffsetbased on the position of the character- Parameters:
pos- the TextPosition object representing the current character
-
findRowYCoordinates
protected void findRowYCoordinates(org.apache.pdfbox.text.TextPosition pos)
Determines therowYCoordinatesbased on the Y-Coordinate positions of the first column of entries- Parameters:
pos- the TextPosition object representing the current character
-
getRowYCoordinates
public java.util.ArrayList<java.lang.Integer> getRowYCoordinates()
Gets the rowYCoordinates- Returns:
- an ArrayList specifying the Y-Coordinates of the rows in the table
-
getPageOffset
public int getPageOffset()
Gets the pageOffset- Returns:
- an Integer specifying the pageOffset
-
showGlyph
protected void showGlyph(org.apache.pdfbox.util.Matrix arg0, org.apache.pdfbox.pdmodel.font.PDFont arg1, int arg2, org.apache.pdfbox.util.Vector arg3) throws java.io.IOException- Overrides:
showGlyphin classorg.apache.pdfbox.contentstream.PDFStreamEngine- Throws:
java.io.IOException
-
-