Class JavadocMetadataScraper

    • Field Detail

      • PROPERTY_TAG

        private static final Pattern PROPERTY_TAG
        Regular expression for property location in class-level javadocs.
      • TYPE_TAG

        private static final Pattern TYPE_TAG
        Regular expression for property type location in class-level javadocs.
      • VALIDATION_TYPE_TAG

        private static final Pattern VALIDATION_TYPE_TAG
        Regular expression for property validation type location in class-level javadocs.
      • DEFAULT_VALUE_TAG

        private static final Pattern DEFAULT_VALUE_TAG
        Regular expression for property default value location in class-level javadocs.
      • EXAMPLES_TAG

        private static final Pattern EXAMPLES_TAG
        Regular expression for check example location in class-level javadocs.
      • PARENT_TAG

        private static final Pattern PARENT_TAG
        Regular expression for module parent location in class-level javadocs.
      • VIOLATION_MESSAGES_TAG

        private static final Pattern VIOLATION_MESSAGES_TAG
        Regular expression for module violation messages location in class-level javadocs.
      • TOKEN_TEXT_PATTERN

        private static final Pattern TOKEN_TEXT_PATTERN
        Regular expression for detecting ANTLR tokens(for e.g. CLASS_DEF).
      • DESC_CLEAN

        private static final Pattern DESC_CLEAN
        Regular expression for removal of @code{-} present at the beginning of texts.
      • FILE_SEPARATOR_PATTERN

        private static final Pattern FILE_SEPARATOR_PATTERN
        Regular expression for file separator corresponding to the host OS.
      • PROPERTIES_TO_NOT_WRITE

        private static final Set<String> PROPERTIES_TO_NOT_WRITE
        This set contains faulty property default value which should not be written to the XML metadata files.
      • scrapingViolationMessageList

        private boolean scrapingViolationMessageList
        Boolean variable which lets us know whether violation message section is being scraped currently.
      • toScan

        private boolean toScan
        Boolean variable which lets us know whether we should scan and scrape the current javadoc or not. Since we need only class level javadoc, it becomes true at its root and false after encountering JavadocTokenTypes.SINCE_LITERAL.
      • rootNode

        private DetailNode rootNode
        DetailNode pointing to the root node of the class level javadoc of the class.
      • propertySectionStartIdx

        private int propertySectionStartIdx
        Child number of the property section node, where parent is the class level javadoc root node.
      • exampleSectionStartIdx

        private int exampleSectionStartIdx
        Child number of the example section node, where parent is the class level javadoc root node.
      • parentSectionStartIdx

        private int parentSectionStartIdx
        Child number of the parent section node, where parent is the class level javadoc root node.
      • writeXmlOutput

        private boolean writeXmlOutput
        Control whether to write XML output or not.
    • Method Detail

      • setWriteXmlOutput

        public final void setWriteXmlOutput​(boolean writeXmlOutput)
        Setter to control whether to write XML output or not.
        Parameters:
        writeXmlOutput - whether to write XML output or not.
      • scrapeContent

        private void scrapeContent​(DetailNode ast)
        Method containing the core logic of scraping. This keeps track and decides which phase of scraping we are in, and accordingly call other subroutines.
        Parameters:
        ast - javadoc ast
      • createProperties

        private static ModulePropertyDetails createProperties​(DetailNode nodeLi)
        Create the modulePropertyDetails content.
        Parameters:
        nodeLi - list item javadoc node
        Returns:
        modulePropertyDetail object for the corresponding property
      • getTagTextFromProperty

        private static String getTagTextFromProperty​(DetailNode nodeLi,
                                                     DetailNode propertyMeta)
        Get tag text from property data.
        Parameters:
        nodeLi - javadoc li item node
        propertyMeta - property javadoc node
        Returns:
        property metadata text
      • cleanDefaultTokensText

        private static String cleanDefaultTokensText​(String initialText)
        Clean up the default token text by removing hyperlinks, and only keeping token type text.
        Parameters:
        initialText - unclean text
        Returns:
        clean text
      • constructSubTreeText

        private static String constructSubTreeText​(DetailNode node,
                                                   int childLeftLimit,
                                                   int childRightLimit)
        Performs a DFS of the subtree with a node as the root and constructs the text of that tree, ignoring JavadocToken texts.
        Parameters:
        node - root node of subtree
        childLeftLimit - the left index of root children from where to scan
        childRightLimit - the right index of root children till where to scan
        Returns:
        constructed text of subtree
      • getDescriptionText

        private String getDescriptionText()
        Create the description text with starting index as 0 and ending index would be the first valid non zero index amongst in the order of propertySectionStartIdx, exampleSectionStartIdx and parentSectionStartIdx.
        Returns:
        description text
      • getPropertyDefaultText

        private static String getPropertyDefaultText​(DetailNode nodeLi,
                                                     DetailNode defaultValueNode)
        Create property default text, which is either normal property value or list of tokens.
        Parameters:
        nodeLi - list item javadoc node
        defaultValueNode - default value node
        Returns:
        default property text
      • getViolationMessages

        private static String getViolationMessages​(DetailNode nodeLi)
        Get the violation message text for a specific key from the list item.
        Parameters:
        nodeLi - list item javadoc node
        Returns:
        violation message key text
      • getTextFromTag

        private static String getTextFromTag​(DetailNode nodeTag)
        Get text from JavadocTokenTypes.JAVADOC_INLINE_TAG.
        Parameters:
        nodeTag - target javadoc tag
        Returns:
        text contained by the tag
      • getFirstChildOfType

        private static Optional<DetailNodegetFirstChildOfType​(DetailNode node,
                                                                int tokenType,
                                                                int offset)
        Returns the first child node which matches the provided TokenType and has the children index after the offset value.
        Parameters:
        node - parent node
        tokenType - token type to match
        offset - children array index offset
        Returns:
        the first child satisfying the conditions
      • getText

        private static String getText​(DetailNode parentNode)
        Get joined text from all text children nodes.
        Parameters:
        parentNode - parent node
        Returns:
        the joined text of node
      • getFirstChildOfMatchingText

        private static Optional<DetailNodegetFirstChildOfMatchingText​(DetailNode node,
                                                                        Pattern pattern)
        Get first child of parent node matching the provided pattern.
        Parameters:
        node - parent node
        pattern - pattern to match against
        Returns:
        the first child node matching the condition
      • getParent

        private static DetailAST getParent​(DetailAST commentBlock)
        Returns parent node, removing modifier/annotation nodes.
        Parameters:
        commentBlock - child node.
        Returns:
        parent node.
      • getParentIndexOf

        private static int getParentIndexOf​(DetailNode node)
        Traverse parents until we reach the root node (@code{JavadocTokenTypes.JAVADOC}) child and return its index.
        Parameters:
        node - subtree child node
        Returns:
        root node child index
      • getParentText

        private static String getParentText​(DetailNode nodeParagraph)
        Get module parent text from paragraph javadoc node.
        Parameters:
        nodeParagraph - paragraph javadoc node
        Returns:
        parent text
      • getModuleType

        private ModuleType getModuleType()
        Get module type(check/filter/filefilter) based on file name.
        Returns:
        module type
      • getModuleSimpleName

        private String getModuleSimpleName()
        Extract simple file name from the whole file path name.
        Returns:
        simple module name
      • getPackageName

        private static String getPackageName​(String filePath)
        Retrieve package name of module from the absolute file path.
        Parameters:
        filePath - absolute file path
        Returns:
        package name
      • resetModuleDetailsStore

        public static void resetModuleDetailsStore()
        Reset the module detail store of any previous information.
      • isTopLevelClassJavadoc

        private boolean isTopLevelClassJavadoc()
        Check if the current javadoc block comment AST corresponds to the top-level class as we only want to scrape top-level class javadoc.
        Returns:
        true if the current AST corresponds to top level class
      • isExamplesText

        private static boolean isExamplesText​(DetailNode ast)
        Checks whether the paragraph node corresponds to the example section.
        Parameters:
        ast - javadoc paragraph node
        Returns:
        true if the section matches the example section marker
      • isPropertyList

        private static boolean isPropertyList​(DetailNode nodeLi)
        Checks whether the list item node is part of a property list.
        Parameters:
        nodeLi - JavadocTokenType.LI node
        Returns:
        true if the node is part of a property list
      • isViolationMessagesText

        private static boolean isViolationMessagesText​(DetailNode nodeParagraph)
        Checks whether the JavadocTokenType.PARAGRAPH node is referring to the violation message keys javadoc segment.
        Parameters:
        nodeParagraph - paragraph javadoc node
        Returns:
        true if paragraph node contains the violation message keys text
      • isParentText

        private static boolean isParentText​(DetailNode nodeParagraph)
        Checks whether the JavadocTokenType.PARAGRAPH node is referring to the parent javadoc segment.
        Parameters:
        nodeParagraph - paragraph javadoc node
        Returns:
        true if paragraph node contains the parent text
      • isChildNodeTextMatches

        private static boolean isChildNodeTextMatches​(DetailNode ast,
                                                      Pattern pattern)
        Checks whether the first child JavadocTokenType.TEXT node matches given pattern.
        Parameters:
        ast - parent javadoc node
        pattern - pattern to match
        Returns:
        true if one of child text nodes matches pattern