CV Parsing & Extraction
Overview
The CV Parsing & Extraction system automates the process of reading uploaded resumes and identifying candidate strengths. By leveraging asynchronous background processing, the system extracts raw text from PDF files and matches them against industry-standard keywords, providing administrators with an immediate "Match Score" for every application.
The Parsing Workflow
The extraction process is triggered automatically upon form submission. The workflow follows these steps:
- Storage: The applicant's PDF is uploaded to the Supabase
cvsstorage bucket. - Trigger: The frontend invokes the
parse-cvEdge Function with the file path and applicant ID. - Extraction: The Edge Function uses
pdfjs-distto parse the PDF layers and compile raw text. - Analysis: The system scans the text for specific job-related keywords.
- Persistence: The extracted text, matched keywords, and final match score are saved back to the
applicantstable in the database.
Edge Function: parse-cv
The core logic resides in a Supabase Edge Function. This allows the heavy lifting of PDF processing to happen off the main browser thread, ensuring a smooth user experience for the applicant.
API Interface
To manually trigger or integrate the parser, use the following interface:
Endpoint: parse-cv
Method: POST
Request Body:
{
applicantId: string; // The UUID of the applicant in the database
cvFilePath: string; // The path to the file in the 'cvs' storage bucket
customKeywords?: string[]; // Optional: additional keywords to look for
}
Example Usage:
const { data, error } = await supabase.functions.invoke("parse-cv", {
body: {
applicantId: "123-abc",
cvFilePath: "171589200-resume.pdf",
},
});
Keyword Matching & Scoring
The system evaluates resumes based on a predefined library of keywords relevant to Japanese language proficiency and technical roles.
Matching Logic
- Word Boundary Detection: The parser uses regular expressions to ensure keywords are matched as whole words (e.g., matching "Java" but not "Javascript" unless "Javascript" is also a keyword).
- Case Insensitivity: Matches are found regardless of how the candidate capitalized the text.
- Score Calculation: The system grants 10 points per unique keyword match, capped at a maximum score of 100%.
Supported Categories
The analyzer currently looks for:
- Japanese Proficiency: JLPT levels (N1, N2, N3, N4, N5) and language skills.
- Tech Stack: Programming languages (Python, JavaScript, Go), Frameworks (React, Node.js), and Databases.
- Infrastructure: Cloud providers (AWS, Azure, GCP) and DevOps tools (Docker, Kubernetes).
- Soft Skills: Leadership, communication, and project management.
Administrator View
Once the parsing is complete, the results are available in the Applicant Detail page within the Admin Module.
- Match Score: Displayed as a visual percentage to help prioritize candidates.
- Matched Keywords: A list of tags showing exactly which required skills were found in the resume.
- Extracted Text: The raw text output from the PDF is stored and viewable, allowing administrators to search through the resume content without downloading the file.