Class SearchIndexGenerator

java.lang.Object
com.puppycrawl.tools.checkstyle.site.SearchIndexGenerator

public final class SearchIndexGenerator extends Object
Generates search-index.json from the Checkstyle XDoc source files.

This is a plain Java main() class - no Maven plugin API required. It is invoked by exec-maven-plugin during the process-classes phase so the index is ready when Maven Site copies static resources.

Output is written as a JSON file. The search widget fetches this file using the fetch API and parses it to populate the search index.

Key design decisions

  • No duplicates. Only plain .xml files are processed for check/filter/filefilter directories. The .xml.template and .xml.vm siblings are pre-render source files that would produce identical URLs and duplicate entries. A secondary URL-keyed dedup guard is also applied across the entire output list.
  • Identifiable example titles. Both -config and -code example paragraphs are indexed. Their titles use the pattern "<CheckName>: Example1 [config]" and "<CheckName>: Example1 [code]" so users can distinguish a configuration snippet from its matching Java code example in search results.
  • Full general-page indexing. Each meaningful <section> in general documentation pages (e.g. config_system_properties, writingchecks, cmdline) is indexed as its own entry with the full section text used for keyword extraction - not just the first sentence. This makes page-internal headings discoverable.
  • Disambiguated generic titles. Structural section names that are repeated across many pages (e.g. "Overview", "Debug", "Contributing") are prefixed with the page title, yielding e.g. "Eclipse IDE: Debug" instead of a bare "Debug" that collides with "IntelliJ IDE: Debug".
  • Junk pages excluded. Release notes, auto-generated style coverage reports and bare category aggregator stubs are skipped.

Usage (called by exec-maven-plugin in pom.xml):

   java SearchIndexGenerator <xdocsDir> <outputFilePath>
   java SearchIndexGenerator src/site/xdoc target/site/search-index.json
 
  • Field Details

  • Constructor Details

  • Method Details

    • main

      public static void main(String... args) throws IOException
      Main entry point called by exec-maven-plugin.
      Parameters:
      args - args[0] = path to src/xdocs, args[1] = path to target/site
      Throws:
      IOException - on file write failure
      IllegalArgumentException - if args are missing
      IllegalStateException - if xdocsDir is missing
    • execute

      private void execute(String... args) throws IOException
      Internal execution method to avoid static context for the logger.
      Parameters:
      args - args[0] = path to src/xdocs, args[1] = output file path
      Throws:
      IOException - on file write failure
      IllegalArgumentException - if args are missing
      IllegalStateException - if xdocsDir is missing
    • processChecksDirectory

      private void processChecksDirectory(File checksDir, File xdocsDir)
      Walks src/xdocs/checks/ and processes each category subdirectory.
      Parameters:
      checksDir - the checks root directory
      xdocsDir - the xdocs root (used for URL building)
    • processDirectory

      private void processDirectory(File dir, File xdocsDir, String category, String type)
      Processes all plain .xml files in a directory (non-recursive). index.xml files and any file whose name ends with .xml.template or .xml.vm are skipped.

      Skipping templates is critical: every check page has a sibling *.xml.template file that resolves to the same HTML URL. Without this filter both files would be processed, producing two identical (or near-identical) main entries plus doubled example and property entries for every check.

      For each plain .xml file, the main check/filter entry, per-example entries (both config and code), and per-property entries are added.

      Parameters:
      dir - directory to scan
      xdocsDir - xdocs root (used for URL building)
      category - category label for all entries in this directory
      type - document type ("Check", "Filter", "File Filter")
    • processGeneralPages

      private void processGeneralPages(File xdocsDir)
      Adds entries for the top-level general documentation pages.

      Each remaining page is indexed per top-level <section>, using the section's full text content for keyword extraction so page-internal headings are fully discoverable. Generic structural section names (see GENERIC_SECTION_NAMES) are disambiguated by prefixing the page's own title.

      Parameters:
      xdocsDir - the xdocs root directory
    • buildMainEntry

      private static SearchIndexEntry buildMainEntry(Document doc, File xmlFile, String category, String type, String baseUrl)
      Builds the main search entry representing an entire check/filter document.
      Parameters:
      doc - the parsed XDoc document
      xmlFile - the source file
      category - category label for this file's entry
      type - document type ("Check", "Filter", etc.)
      baseUrl - the page url without anchor
      Returns:
      an entry representing the document
    • buildGeneralPageEntries

      Builds one search entry per top-level <section> in a general documentation page, using each section's full text for keyword extraction so that page-internal content is fully discoverable.

      Generic structural section names (see GENERIC_SECTION_NAMES) are disambiguated as "<page title>: <section name>" to avoid collisions across pages (e.g. "Eclipse IDE: Debug" vs "IntelliJ IDE: Debug").

      Parameters:
      xmlFile - the XDoc source file to parse
      Returns:
      list of entries, one per top-level section found
      Throws:
      ParserConfigurationException - on XML parser setup failure
      SAXException - on XML parse error
      IOException - on file read failure
    • extractExampleEntries

      private static List<SearchIndexEntry> extractExampleEntries(Document doc, String baseUrl, String category)
      Extracts per-example search entries from a check/filter document.

      Both -config and -code example paragraphs are indexed so users can find both the configuration snippet and the corresponding Java code example independently in search results.

      Titles use the pattern "<CheckName>: Example1 [config]" and "<CheckName>: Example1 [code]" to make the type immediately visible in search result listings without needing to open the page.

      Confirmed XDoc template structure for the Examples subsection:

         <p id="Example1-config">To configure the check...</p>
         <macro name="example"><param name="type" value="config"/></macro>
         <p id="Example1-code">Example:</p>
         <macro name="example"><param name="type" value="code"/></macro>
       
      Parameters:
      doc - the parsed XDoc document
      baseUrl - the page url without anchor
      category - category label
      Returns:
      list of per-example entries (both config and code); empty if none found
    • buildExampleEntry

      private static SearchIndexEntry buildExampleEntry(Element paragraph, String checkName, String baseUrl, String category)
      Builds a single example entry from a paragraph element.
      Parameters:
      paragraph - the paragraph element containing the example
      checkName - the name of the check
      baseUrl - the base URL for the page
      category - the category label
      Returns:
      a SearchIndexEntry if the paragraph matches the example pattern, null otherwise
    • extractPropertyEntries

      private static List<SearchIndexEntry> extractPropertyEntries(Document doc, String baseUrl, String category)
      Extracts per-property search entries from a check/filter document.

      Each row of the Properties table is indexed under the title "<CheckName>: <propertyName>" and linked to the property's own anchor on the page.

      Parameters:
      doc - the parsed XDoc document
      baseUrl - the page url without anchor
      category - category label
      Returns:
      list of per-property entries; empty if none found
    • extractPropertiesFromRows

      private static void extractPropertiesFromRows(Element propertiesSubsection, String checkName, String baseUrl, String category, List<SearchIndexEntry> propertyEntries)
      Extracts property entries from table rows and adds them to the list.
      Parameters:
      propertiesSubsection - the properties subsection element
      checkName - the check name
      baseUrl - the page url without anchor
      category - category label
      propertyEntries - the list to add entries to
    • processPropertyRow

      private static void processPropertyRow(NodeList cells, String checkName, String baseUrl, String category, List<SearchIndexEntry> propertyEntries)
      Processes a single property row and adds an entry if valid.
      Parameters:
      cells - the table cells
      checkName - the check name
      baseUrl - the page url without anchor
      category - category label
      propertyEntries - the list to add entries to
    • addIfNew

      private void addIfNew(SearchIndexEntry entry)
      Adds an entry to the output list only if its URL has not been seen before. This is a secondary guard that catches any duplicates that slip through the primary filter (only processing plain .xml files), e.g. if a check has the same example paragraph id repeated across two sections.
      Parameters:
      entry - the entry to conditionally add
    • findSubsectionByPrefix

      private static Element findSubsectionByPrefix(Element section, String fragment)
      Finds a subsection within a section whose lowercased name contains the given fragment (e.g. "examples" or "propert" to match "Properties").
      Parameters:
      section - the section to search
      fragment - lowercase fragment to match against the subsection name
      Returns:
      the matching subsection element, or null if not found
    • parseXml

      Parses the XML file into a Document with external entity resolution disabled for security.
      Parameters:
      xmlFile - the XDoc source file
      Returns:
      the parsed Document
      Throws:
      ParserConfigurationException - on XML parser setup failure
      SAXException - on XML parse error
      IOException - on file read failure
    • extractTitle

      private static String extractTitle(Document doc, File xmlFile, NodeList sections)
      Extracts the document title from the <title> element, falling back to the first non-empty, non-"Content" section name, and finally to a capitalised version of the file name.
      Parameters:
      doc - the document
      xmlFile - the source file
      sections - the list of sections
      Returns:
      the title string, never empty
    • extractAggregateDescription

      private static String extractAggregateDescription(NodeList sections)
      Aggregates description from sections, taking the first non-empty Description subsection found across all sections in the document.
      Parameters:
      sections - list of sections
      Returns:
      description string, possibly empty
    • extractAggregateKeywords

      private static String extractAggregateKeywords(String title, NodeList sections)
      Aggregates keywords from sections using all section text so that the main check entry is discoverable by any term in the document.
      Parameters:
      title - the document title
      sections - list of sections
      Returns:
      keywords string
    • extractDescription

      private static String extractDescription(Element section)
      Extracts the first sentence of the Description subsection. Returns an empty string if no Description subsection is found.
      Parameters:
      section - the <section> element to search
      Returns:
      first sentence of the description, or empty string
    • derivePageTitle

      private static String derivePageTitle(Document doc, File xmlFile)
      Derives a fallback page title from the document's <title> element or, failing that, from the filename.
      Parameters:
      doc - the parsed document
      xmlFile - the source file
      Returns:
      a non-empty title string
    • disambiguateTitle

      private static String disambiguateTitle(String sectionName, String pageTitle)
      Disambiguates a section title when it is a generic, structurally repeated header (see GENERIC_SECTION_NAMES). Non-generic section names are returned unchanged.
      Parameters:
      sectionName - the raw section name
      pageTitle - the owning page's own title
      Returns:
      either sectionName unchanged, or "<pageTitle>: <sectionName>" if generic
    • doxiaAnchorFor

      private static String doxiaAnchorFor(String sectionName)
      Converts a Doxia <section name="..."> value into the anchor id Doxia generates for it in the rendered HTML by replacing runs of whitespace with single underscores.
      Parameters:
      sectionName - the raw name attribute value
      Returns:
      the anchor id Doxia would render for this section name
    • extractFirstSentenceOrTruncated

      Returns the first sentence of the given text (up to and including the first period), or the text truncated to MAX_DESCRIPTION_LENGTH with an ellipsis if no period is found within range.
      Parameters:
      text - the source text, already whitespace-normalised
      Returns:
      first sentence or truncated text
    • truncate

      private static String truncate(String text, int maxLength)
      Truncates text to the given max length, appending an ellipsis if truncation occurred.
      Parameters:
      text - the text to truncate
      maxLength - maximum length before truncation
      Returns:
      original text if short enough, otherwise truncated with ellipsis
    • buildUrl

      private static String buildUrl(File xmlFile, File xdocsDir)
      Builds the root-relative URL for an XDoc file, without any anchor. Always uses forward slashes regardless of OS.
      Parameters:
      xmlFile - the source XDoc file
      xdocsDir - the xdocs root directory
      Returns:
      root-relative URL string with no anchor
    • resolvePageUrl

      private static String resolvePageUrl(File xmlFile, File xdocsDir)
      Resolves the correct URL for a general page file. For config_<category>.xml files that redirect to check category pages, maps to checks/<category>/index.html instead of the file path.
      Parameters:
      xmlFile - the source XDoc file
      xdocsDir - the xdocs root directory
      Returns:
      the resolved URL
    • extractKeywordsFromText

      private static String extractKeywordsFromText(String text)
      Extracts keywords from free-form text by splitting on non-word characters and filtering short and stop words.
      Parameters:
      text - input text
      Returns:
      comma-separated keyword string (up to MAX_KEYWORDS words)
    • writeJson

      private void writeJson(List<SearchIndexEntry> indexEntries, Path outputFilePath) throws IOException
      Writes all index entries to the output file.
      Parameters:
      indexEntries - the list of entries to serialise
      outputFilePath - the full path to the output file
      Throws:
      IOException - on file write failure
    • capitalise

      private static String capitalise(String input)
      Capitalises the first character of a string.
      Parameters:
      input - the string to capitalise
      Returns:
      string with first character uppercased, or input unchanged if empty