Keyword Matching Algorithm
Keyword Matching & Scoring
The Japanese ATS includes an automated resume analyzer designed to help administrators quickly identify qualified candidates. When an applicant uploads a PDF resume, the system extracts the text and compares it against a standardized dictionary of professional keywords.
The Keyword Dictionary
The algorithm scans for a wide range of professional competencies, specifically tailored for technical and language-oriented roles. The pre-defined dictionary includes:
- Japanese Proficiency: JLPT levels (N1, N2, N3, N4, N5) and general language skills.
- Programming Languages: Java, JavaScript, TypeScript, Python, C++, Ruby, etc.
- Web & Cloud Technologies: React, Node.js, AWS, Azure, Docker, and CI/CD.
- Databases: SQL, PostgreSQL, MongoDB, and others.
- Professional Experience: Titles like "Senior," "Lead," "Architect," and "Manager."
- Education: Verification of degrees (Bachelor, Master, PhD).
Scoring Mechanism
The Keyword Match Score provides a quantitative measure of how well a resume aligns with the system's target criteria.
- Match Increment: Each unique keyword found in the resume adds 10% to the total score.
- Maximum Score: The score is capped at 100%.
- Precision Matching: The algorithm uses word-boundary detection (Regex) to ensure accuracy. For example, it distinguishes "Java" from "JavaScript" and correctly identifies terms with symbols like "C++" or "C#".
Analysis Process
- Text Extraction: Upon submission, a background Edge Function processes the uploaded PDF using
pdfjs-distto convert document layouts into searchable text. - Normalization: The extracted text and keywords are normalized to lowercase to ensure the matching is case-insensitive.
- Keyword Identification: The system runs a comparison between the resume text and the global dictionary (plus any custom keywords provided).
- Database Update: The resulting score and a list of specific "Matched Keywords" are stored alongside the applicant's profile.
Administrator View
Administrators can view these metrics directly within the Admin Dashboard and Applicant Detail pages:
- Match Score Badge: A visual percentage indicator that helps in rapid shortlisting.
- Matched Keywords List: A set of tags showing exactly which relevant terms were found in the candidate's CV.
- Extracted Text: For transparency, administrators can view the raw text extracted from the PDF to verify the context of specific keywords.
// Example of how the score is calculated internally:
const score = Math.min(100, matchedKeywords.length * 10);
Limitations
- File Format: Currently, the keyword matching algorithm only supports PDF files.
- Image-based PDFs: Resumes that are saved as images (scanned documents) without an OCR layer cannot be read by the current extractor. Candidates are encouraged to upload text-based PDFs for the best results.