Package FLOTF

Class PDFDimensionFinder

    • Field Summary

      Fields 
      Modifier and Type Field Description
      private boolean firstChar
      Determines if processTextPosition is on the first character
      private int pageOffset
      Number of pixels that the page is offset to the right from a left-leaning page
      private java.util.ArrayList<java.lang.Integer> rowYCoordinates
      Stores the Y-Coordinate positions of the rows in the table
      • Fields inherited from class org.apache.pdfbox.text.PDFTextStripper

        charactersByArticle, document, LINE_SEPARATOR, output
    • Constructor Summary

      Constructors 
      Constructor Description
      PDFDimensionFinder()
      PDFDimensionFinder constructor
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      protected void findOffset​(org.apache.pdfbox.text.TextPosition pos)
      Looks at a specific TextPosition character and determines the pageOffset based on the position of the character
      protected void findRowYCoordinates​(org.apache.pdfbox.text.TextPosition pos)
      Determines the rowYCoordinates based on the Y-Coordinate positions of the first column of entries
      int getPageOffset()
      Gets the pageOffset
      java.util.ArrayList<java.lang.Integer> getRowYCoordinates()
      Gets the rowYCoordinates
      protected void processTextPosition​(org.apache.pdfbox.text.TextPosition pos)
      Processes all the characters in the PDF character by character, determining the page's lean based on the first character and also looking at all the characters to determine what Y-Coordinates to add to rowYCoordinates
      protected void showGlyph​(org.apache.pdfbox.util.Matrix arg0, org.apache.pdfbox.pdmodel.font.PDFont arg1, int arg2, org.apache.pdfbox.util.Vector arg3)  
      • Methods inherited from class org.apache.pdfbox.text.PDFTextStripperByArea

        addRegion, extractRegions, getRegions, getTextForRegion, removeRegion, setShouldSeparateByBeads, writePage
      • Methods inherited from class org.apache.pdfbox.text.PDFTextStripper

        endArticle, endDocument, endPage, getAddMoreFormatting, getArticleEnd, getArticleStart, getAverageCharTolerance, getCharactersByArticle, getCurrentPageNo, getDropThreshold, getEndBookmark, getEndPage, getIndentThreshold, getLineSeparator, getListItemPatterns, getOutput, getPageEnd, getPageStart, getParagraphEnd, getParagraphStart, getSeparateByBeads, getSortByPosition, getSpacingTolerance, getStartBookmark, getStartPage, getSuppressDuplicateOverlappingText, getText, getWordSeparator, matchPattern, processPage, processPages, setAddMoreFormatting, setArticleEnd, setArticleStart, setAverageCharTolerance, setDropThreshold, setEndBookmark, setEndPage, setIndentThreshold, setLineSeparator, setListItemPatterns, setPageEnd, setPageStart, setParagraphEnd, setParagraphStart, setSortByPosition, setSpacingTolerance, setStartBookmark, setStartPage, setSuppressDuplicateOverlappingText, setWordSeparator, startArticle, startArticle, startDocument, startPage, writeCharacters, writeLineSeparator, writePageEnd, writePageStart, writeParagraphEnd, writeParagraphSeparator, writeParagraphStart, writeString, writeString, writeText, writeWordSeparator
      • Methods inherited from class org.apache.pdfbox.contentstream.PDFStreamEngine

        addOperator, applyTextAdjustment, beginMarkedContentSequence, beginText, decreaseLevel, endMarkedContentSequence, endText, getAppearance, getCurrentPage, getGraphicsStackSize, getGraphicsState, getInitialMatrix, getLevel, getResources, getTextLineMatrix, getTextMatrix, increaseLevel, operatorException, processAnnotation, processChildStream, processOperator, processOperator, processSoftMask, processTilingPattern, processTilingPattern, processTransparencyGroup, processType3Stream, registerOperatorProcessor, restoreGraphicsStack, restoreGraphicsState, saveGraphicsStack, saveGraphicsState, setLineDashPattern, setTextLineMatrix, setTextMatrix, showAnnotation, showFontGlyph, showFontGlyph, showForm, showGlyph, showText, showTextString, showTextStrings, showTransparencyGroup, showType3Glyph, showType3Glyph, transformedPoint, transformWidth, unsupportedOperator
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • rowYCoordinates

        private java.util.ArrayList<java.lang.Integer> rowYCoordinates
        Stores the Y-Coordinate positions of the rows in the table
      • firstChar

        private boolean firstChar
        Determines if processTextPosition is on the first character
      • pageOffset

        private int pageOffset
        Number of pixels that the page is offset to the right from a left-leaning page
    • Constructor Detail

      • PDFDimensionFinder

        public PDFDimensionFinder()
                           throws java.io.IOException
        PDFDimensionFinder constructor
        Throws:
        java.io.IOException - if PDFTextStripperByArea does not initialize properly
    • Method Detail

      • processTextPosition

        protected void processTextPosition​(org.apache.pdfbox.text.TextPosition pos)
        Processes all the characters in the PDF character by character, determining the page's lean based on the first character and also looking at all the characters to determine what Y-Coordinates to add to rowYCoordinates
        Overrides:
        processTextPosition in class org.apache.pdfbox.text.PDFTextStripperByArea
        Parameters:
        pos - the TextPosition object representing the current character
      • findOffset

        protected void findOffset​(org.apache.pdfbox.text.TextPosition pos)
        Looks at a specific TextPosition character and determines the pageOffset based on the position of the character
        Parameters:
        pos - the TextPosition object representing the current character
      • findRowYCoordinates

        protected void findRowYCoordinates​(org.apache.pdfbox.text.TextPosition pos)
        Determines the rowYCoordinates based on the Y-Coordinate positions of the first column of entries
        Parameters:
        pos - the TextPosition object representing the current character
      • getRowYCoordinates

        public java.util.ArrayList<java.lang.Integer> getRowYCoordinates()
        Gets the rowYCoordinates
        Returns:
        an ArrayList specifying the Y-Coordinates of the rows in the table
      • getPageOffset

        public int getPageOffset()
        Gets the pageOffset
        Returns:
        an Integer specifying the pageOffset
      • showGlyph

        protected void showGlyph​(org.apache.pdfbox.util.Matrix arg0,
                                 org.apache.pdfbox.pdmodel.font.PDFont arg1,
                                 int arg2,
                                 org.apache.pdfbox.util.Vector arg3)
                          throws java.io.IOException
        Overrides:
        showGlyph in class org.apache.pdfbox.contentstream.PDFStreamEngine
        Throws:
        java.io.IOException