智能简历解析器实战教程:基于Spacy+Flask构建自动化人才筛选系统
一、项目背景与技术选型在人力资源领域,每天需要处理数百份简历的HR团队面临巨大挑战:人工筛选效率低下、关键信息遗漏风险高、跨文档对比分析困难。本教程将构建一个端到端的智能简历解析系统,通过NLP技术自动提取候选人核心信息,结合Web服务实现可视化展示。
技术栈解析
组件功能定位替代方案PDFPlumberPDF文本提取PyPDF2、camelotspaCy实体识别与NLP处理NLTK、TransformersFlaskWeb服务框架FastAPI、DjangoVue.js前端展示(可选)React、Angular二、系统架构设计
graph<input type="file" id="resumeUpload" accept=".pdf">
TD<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
A[用户上传PDF简历]<input type="file" id="resumeUpload" accept=".pdf">
--><input type="file" id="resumeUpload" accept=".pdf">
B{Flask后端}<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
B<input type="file" id="resumeUpload" accept=".pdf">
--><input type="file" id="resumeUpload" accept=".pdf">
C<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
C<input type="file" id="resumeUpload" accept=".pdf">
--><input type="file" id="resumeUpload" accept=".pdf">
D[文本预处理]<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
D<input type="file" id="resumeUpload" accept=".pdf">
--><input type="file" id="resumeUpload" accept=".pdf">
E[实体识别模型]<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
E<input type="file" id="resumeUpload" accept=".pdf">
--><input type="file" id="resumeUpload" accept=".pdf">
F[关键信息提取]<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
F<input type="file" id="resumeUpload" accept=".pdf">
--><input type="file" id="resumeUpload" accept=".pdf">
G[数据库存储]<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
G<input type="file" id="resumeUpload" accept=".pdf">
--><input type="file" id="resumeUpload" accept=".pdf">
H[前端展示]<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
style<input type="file" id="resumeUpload" accept=".pdf">
B<input type="file" id="resumeUpload" accept=".pdf">
fill:#4CAF50,color:white<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
style<input type="file" id="resumeUpload" accept=".pdf">
E<input type="file" id="resumeUpload" accept=".pdf">
fill:#2196F3,color:white三、核心模块实现详解
3.1<input type="file" id="resumeUpload" accept=".pdf">
PDF解析层(PDFPlumber)
#<input type="file" id="resumeUpload" accept=".pdf">
pdf_parser.py
import<input type="file" id="resumeUpload" accept=".pdf">
pdfplumber
<input type="file" id="resumeUpload" accept=".pdf">
def<input type="file" id="resumeUpload" accept=".pdf">
extract_text(pdf_path):
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
text<input type="file" id="resumeUpload" accept=".pdf">
=<input type="file" id="resumeUpload" accept=".pdf">
""
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
with<input type="file" id="resumeUpload" accept=".pdf">
pdfplumber.open(pdf_path)<input type="file" id="resumeUpload" accept=".pdf">
as<input type="file" id="resumeUpload" accept=".pdf">
pdf:
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
for<input type="file" id="resumeUpload" accept=".pdf">
page<input type="file" id="resumeUpload" accept=".pdf">
in<input type="file" id="resumeUpload" accept=".pdf">
pdf.pages:
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
text<input type="file" id="resumeUpload" accept=".pdf">
+=<input type="file" id="resumeUpload" accept=".pdf">
page.extract_text()<input type="file" id="resumeUpload" accept=".pdf">
+<input type="file" id="resumeUpload" accept=".pdf">
"\n"
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
return<input type="file" id="resumeUpload" accept=".pdf">
clean_text(text)
<input type="file" id="resumeUpload" accept=".pdf">
def<input type="file" id="resumeUpload" accept=".pdf">
clean_text(raw_text):
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
#<input type="file" id="resumeUpload" accept=".pdf">
移除特殊字符和多余空格
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
import<input type="file" id="resumeUpload" accept=".pdf">
re
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
text<input type="file" id="resumeUpload" accept=".pdf">
=<input type="file" id="resumeUpload" accept=".pdf">
re.sub(r'[\x00-\x1F]+',<input type="file" id="resumeUpload" accept=".pdf">
'<input type="file" id="resumeUpload" accept=".pdf">
',<input type="file" id="resumeUpload" accept=".pdf">
raw_text)
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
text<input type="file" id="resumeUpload" accept=".pdf">
=<input type="file" id="resumeUpload" accept=".pdf">
re.sub(r'\s+',<input type="file" id="resumeUpload" accept=".pdf">
'<input type="file" id="resumeUpload" accept=".pdf">
',<input type="file" id="resumeUpload" accept=".pdf">
text).strip()
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
return<input type="file" id="resumeUpload" accept=".pdf">
text进阶处理技巧:
[*]处理扫描件PDF:集成Tesseract<input type="file" id="resumeUpload" accept=".pdf">
OCR;
[*]表格数据提取:使用extract_tables()方法;
[*]布局分析:通过chars对象获取文字坐标。
3.2<input type="file" id="resumeUpload" accept=".pdf">
NLP处理层(spaCy)
3.2.1<input type="file" id="resumeUpload" accept=".pdf">
自定义实体识别模型训练
[*]准备标注数据(JSON格式示例):
[
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
{
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
"text":<input type="file" id="resumeUpload" accept=".pdf">
"张三<input type="file" id="resumeUpload" accept=".pdf">
2018年毕业于北京大学计算机科学与技术专业",
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
"entities":<input type="file" id="resumeUpload" accept=".pdf">
[
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
{"start":<input type="file" id="resumeUpload" accept=".pdf">
0,<input type="file" id="resumeUpload" accept=".pdf">
"end":<input type="file" id="resumeUpload" accept=".pdf">
2,<input type="file" id="resumeUpload" accept=".pdf">
"label":<input type="file" id="resumeUpload" accept=".pdf">
"NAME"},
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
{"start":<input type="file" id="resumeUpload" accept=".pdf">
5,<input type="file" id="resumeUpload" accept=".pdf">
"end":<input type="file" id="resumeUpload" accept=".pdf">
9,<input type="file" id="resumeUpload" accept=".pdf">
"label":<input type="file" id="resumeUpload" accept=".pdf">
"GRAD_YEAR"},
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
{"start":<input type="file" id="resumeUpload" accept=".pdf">
12,<input type="file" id="resumeUpload" accept=".pdf">
"end":<input type="file" id="resumeUpload" accept=".pdf">
16,<input type="file" id="resumeUpload" accept=".pdf">
"label":<input type="file" id="resumeUpload" accept=".pdf">
"EDU_ORG"},
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
{"start":<input type="file" id="resumeUpload" accept=".pdf">
16,<input type="file" id="resumeUpload" accept=".pdf">
"end":<input type="file" id="resumeUpload" accept=".pdf">
24,<input type="file" id="resumeUpload" accept=".pdf">
"label":<input type="file" id="resumeUpload" accept=".pdf">
"MAJOR"}
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
]
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
}
]2.训练流程代码:
#<input type="file" id="resumeUpload" accept=".pdf">
train_ner.py
import<input type="file" id="resumeUpload" accept=".pdf">
spacy
from<input type="file" id="resumeUpload" accept=".pdf">
spacy.util<input type="file" id="resumeUpload" accept=".pdf">
import<input type="file" id="resumeUpload" accept=".pdf">
minibatch,<input type="file" id="resumeUpload" accept=".pdf">
compounding
<input type="file" id="resumeUpload" accept=".pdf">
def<input type="file" id="resumeUpload" accept=".pdf">
train_model(train_data,<input type="file" id="resumeUpload" accept=".pdf">
output_dir,<input type="file" id="resumeUpload" accept=".pdf">
n_iter=20):
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
nlp<input type="file" id="resumeUpload" accept=".pdf">
=<input type="file" id="resumeUpload" accept=".pdf">
spacy.blank("zh_core_web_sm")<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
#<input type="file" id="resumeUpload" accept=".pdf">
中文模型
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
if<input type="file" id="resumeUpload" accept=".pdf">
"ner"<input type="file" id="resumeUpload" accept=".pdf">
not<input type="file" id="resumeUpload" accept=".pdf">
in<input type="file" id="resumeUpload" accept=".pdf">
nlp.pipe_names:
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
ner<input type="file" id="resumeUpload" accept=".pdf">
=<input type="file" id="resumeUpload" accept=".pdf">
nlp.create_pipe("ner")
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
nlp.add_pipe(ner,<input type="file" id="resumeUpload" accept=".pdf">
last=True)
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
#<input type="file" id="resumeUpload" accept=".pdf">
添加标签
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
for<input type="file" id="resumeUpload" accept=".pdf">
_,<input type="file" id="resumeUpload" accept=".pdf">
annotations<input type="file" id="resumeUpload" accept=".pdf">
in<input type="file" id="resumeUpload" accept=".pdf">
train_data:
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
for<input type="file" id="resumeUpload" accept=".pdf">
ent<input type="file" id="resumeUpload" accept=".pdf">
in<input type="file" id="resumeUpload" accept=".pdf">
annotations.get("entities"):
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
ner.add_label(ent)
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
#<input type="file" id="resumeUpload" accept=".pdf">
训练配置
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
other_pipes<input type="file" id="resumeUpload" accept=".pdf">
=<input type="file" id="resumeUpload" accept=".pdf">
[pipe<input type="file" id="resumeUpload" accept=".pdf">
for<input type="file" id="resumeUpload" accept=".pdf">
pipe<input type="file" id="resumeUpload" accept=".pdf">
in<input type="file" id="resumeUpload" accept=".pdf">
nlp.pipe_names<input type="file" id="resumeUpload" accept=".pdf">
if<input type="file" id="resumeUpload" accept=".pdf">
pipe<input type="file" id="resumeUpload" accept=".pdf">
!=<input type="file" id="resumeUpload" accept=".pdf">
"ner"]
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
with<input type="file" id="resumeUpload" accept=".pdf">
nlp.disable_pipes(*other_pipes):
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
optimizer<input type="file" id="resumeUpload" accept=".pdf">
=<input type="file" id="resumeUpload" accept=".pdf">
nlp.begin_training()
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
for<input type="file" id="resumeUpload" accept=".pdf">
i<input type="file" id="resumeUpload" accept=".pdf">
in<input type="file" id="resumeUpload" accept=".pdf">
range(n_iter):
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
losses<input type="file" id="resumeUpload" accept=".pdf">
=<input type="file" id="resumeUpload" accept=".pdf">
{}
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
batches<input type="file" id="resumeUpload" accept=".pdf">
=<input type="file" id="resumeUpload" accept=".pdf">
minibatch(train_data,<input type="file" id="resumeUpload" accept=".pdf">
size=compounding(4.0,<input type="file" id="resumeUpload" accept=".pdf">
32.0,<input type="file" id="resumeUpload" accept=".pdf">
1.001))
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
for<input type="file" id="resumeUpload" accept=".pdf">
batch<input type="file" id="resumeUpload" accept=".pdf">
in<input type="file" id="resumeUpload" accept=".pdf">
batches:
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
texts,<input type="file" id="resumeUpload" accept=".pdf">
annotations<input type="file" id="resumeUpload" accept=".pdf">
=<input type="file" id="resumeUpload" accept=".pdf">
zip(*batch)
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
nlp.update(
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
texts,<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
annotations,
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
drop=0.5,
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
sgd=optimizer,
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
losses=losses
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
)
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
print(f"Losses<input type="file" id="resumeUpload" accept=".pdf">
at<input type="file" id="resumeUpload" accept=".pdf">
iteration<input type="file" id="resumeUpload" accept=".pdf">
{i}:<input type="file" id="resumeUpload" accept=".pdf">
{losses}")
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
nlp.to_disk(output_dir)
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
print("Model<input type="file" id="resumeUpload" accept=".pdf">
saved!")3.2.2<input type="file" id="resumeUpload" accept=".pdf">
关键词匹配算法
#<input type="file" id="resumeUpload" accept=".pdf">
keyword_matcher.py
from<input type="file" id="resumeUpload" accept=".pdf">
spacy.matcher<input type="file" id="resumeUpload" accept=".pdf">
import<input type="file" id="resumeUpload" accept=".pdf">
Matcher
<input type="file" id="resumeUpload" accept=".pdf">
def<input type="file" id="resumeUpload" accept=".pdf">
create_matcher(nlp):
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
matcher<input type="file" id="resumeUpload" accept=".pdf">
=<input type="file" id="resumeUpload" accept=".pdf">
Matcher(nlp.vocab)
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
#<input type="file" id="resumeUpload" accept=".pdf">
技能关键词模式
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
skill_patterns<input type="file" id="resumeUpload" accept=".pdf">
=<input type="file" id="resumeUpload" accept=".pdf">
[
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
[{"ENT_TYPE":<input type="file" id="resumeUpload" accept=".pdf">
"SKILL"},<input type="file" id="resumeUpload" accept=".pdf">
{"OP":<input type="file" id="resumeUpload" accept=".pdf">
"+",<input type="file" id="resumeUpload" accept=".pdf">
"ENT_TYPE":<input type="file" id="resumeUpload" accept=".pdf">
"SKILL"}],
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
[{"ENT_TYPE":<input type="file" id="resumeUpload" accept=".pdf">
"SKILL"}]
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
]
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
#<input type="file" id="resumeUpload" accept=".pdf">
教育背景模式
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
edu_patterns<input type="file" id="resumeUpload" accept=".pdf">
=<input type="file" id="resumeUpload" accept=".pdf">
[
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
[{"ENT_TYPE":<input type="file" id="resumeUpload" accept=".pdf">
"EDU_ORG"},<input type="file" id="resumeUpload" accept=".pdf">
{"ENT_TYPE":<input type="file" id="resumeUpload" accept=".pdf">
"MAJOR"}],
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
[{"ENT_TYPE":<input type="file" id="resumeUpload" accept=".pdf">
"GRAD_YEAR"}]
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
]
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
matcher.add("SKILL_MATCH",<input type="file" id="resumeUpload" accept=".pdf">
None,<input type="file" id="resumeUpload" accept=".pdf">
*skill_patterns)
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
matcher.add("EDU_MATCH",<input type="file" id="resumeUpload" accept=".pdf">
None,<input type="file" id="resumeUpload" accept=".pdf">
*edu_patterns)
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
return<input type="file" id="resumeUpload" accept=".pdf">
matcher3.3<input type="file" id="resumeUpload" accept=".pdf">
Web服务层(Flask)
#<input type="file" id="resumeUpload" accept=".pdf">
app.py
from<input type="file" id="resumeUpload" accept=".pdf">
flask<input type="file" id="resumeUpload" accept=".pdf">
import<input type="file" id="resumeUpload" accept=".pdf">
Flask,<input type="file" id="resumeUpload" accept=".pdf">
request,<input type="file" id="resumeUpload" accept=".pdf">
jsonify
import<input type="file" id="resumeUpload" accept=".pdf">
pdfplumber
import<input type="file" id="resumeUpload" accept=".pdf">
spacy
<input type="file" id="resumeUpload" accept=".pdf">
app<input type="file" id="resumeUpload" accept=".pdf">
=<input type="file" id="resumeUpload" accept=".pdf">
Flask(__name__)
<input type="file" id="resumeUpload" accept=".pdf">
#<input type="file" id="resumeUpload" accept=".pdf">
加载模型
nlp<input type="file" id="resumeUpload" accept=".pdf">
=<input type="file" id="resumeUpload" accept=".pdf">
spacy.load("trained_model")
matcher<input type="file" id="resumeUpload" accept=".pdf">
=<input type="file" id="resumeUpload" accept=".pdf">
create_matcher(nlp)
<input type="file" id="resumeUpload" accept=".pdf">
@app.route('/parse',<input type="file" id="resumeUpload" accept=".pdf">
methods=['POST'])
def<input type="file" id="resumeUpload" accept=".pdf">
parse_resume():
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
if<input type="file" id="resumeUpload" accept=".pdf">
'file'<input type="file" id="resumeUpload" accept=".pdf">
not<input type="file" id="resumeUpload" accept=".pdf">
in<input type="file" id="resumeUpload" accept=".pdf">
request.files:
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
return<input type="file" id="resumeUpload" accept=".pdf">
jsonify({"error":<input type="file" id="resumeUpload" accept=".pdf">
"No<input type="file" id="resumeUpload" accept=".pdf">
file<input type="file" id="resumeUpload" accept=".pdf">
uploaded"}),<input type="file" id="resumeUpload" accept=".pdf">
400
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
file<input type="file" id="resumeUpload" accept=".pdf">
=<input type="file" id="resumeUpload" accept=".pdf">
request.files['file']
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
if<input type="file" id="resumeUpload" accept=".pdf">
file.filename.split('.')[-1].lower()<input type="file" id="resumeUpload" accept=".pdf">
!=<input type="file" id="resumeUpload" accept=".pdf">
'pdf':
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
return<input type="file" id="resumeUpload" accept=".pdf">
jsonify({"error":<input type="file" id="resumeUpload" accept=".pdf">
"Only<input type="file" id="resumeUpload" accept=".pdf">
PDF<input type="file" id="resumeUpload" accept=".pdf">
files<input type="file" id="resumeUpload" accept=".pdf">
allowed"}),<input type="file" id="resumeUpload" accept=".pdf">
400
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
#<input type="file" id="resumeUpload" accept=".pdf">
保存临时文件
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
import<input type="file" id="resumeUpload" accept=".pdf">
tempfile
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
with<input type="file" id="resumeUpload" accept=".pdf">
tempfile.NamedTemporaryFile(delete=True)<input type="file" id="resumeUpload" accept=".pdf">
as<input type="file" id="resumeUpload" accept=".pdf">
tmp:
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
file.save(tmp.name)
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
#<input type="file" id="resumeUpload" accept=".pdf">
解析PDF
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
text<input type="file" id="resumeUpload" accept=".pdf">
=<input type="file" id="resumeUpload" accept=".pdf">
extract_text(tmp.name)
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
#<input type="file" id="resumeUpload" accept=".pdf">
NLP处理
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
doc<input type="file" id="resumeUpload" accept=".pdf">
=<input type="file" id="resumeUpload" accept=".pdf">
nlp(text)
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
matches<input type="file" id="resumeUpload" accept=".pdf">
=<input type="file" id="resumeUpload" accept=".pdf">
matcher(doc)
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
#<input type="file" id="resumeUpload" accept=".pdf">
结果提取
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
results<input type="file" id="resumeUpload" accept=".pdf">
=<input type="file" id="resumeUpload" accept=".pdf">
{
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
"name":<input type="file" id="resumeUpload" accept=".pdf">
get_name(doc.ents),
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
"skills":<input type="file" id="resumeUpload" accept=".pdf">
extract_skills(doc.ents,<input type="file" id="resumeUpload" accept=".pdf">
matches),
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
"education":<input type="file" id="resumeUpload" accept=".pdf">
extract_education(doc.ents,<input type="file" id="resumeUpload" accept=".pdf">
matches)
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
}
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
return<input type="file" id="resumeUpload" accept=".pdf">
jsonify(results)
<input type="file" id="resumeUpload" accept=".pdf">
def<input type="file" id="resumeUpload" accept=".pdf">
get_name(entities):
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
for<input type="file" id="resumeUpload" accept=".pdf">
ent<input type="file" id="resumeUpload" accept=".pdf">
in<input type="file" id="resumeUpload" accept=".pdf">
entities:
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
if<input type="file" id="resumeUpload" accept=".pdf">
ent.label_<input type="file" id="resumeUpload" accept=".pdf">
==<input type="file" id="resumeUpload" accept=".pdf">
"NAME":
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
return<input type="file" id="resumeUpload" accept=".pdf">
ent.text
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
return<input type="file" id="resumeUpload" accept=".pdf">
"未识别"
<input type="file" id="resumeUpload" accept=".pdf">
if<input type="file" id="resumeUpload" accept=".pdf">
__name__<input type="file" id="resumeUpload" accept=".pdf">
==<input type="file" id="resumeUpload" accept=".pdf">
'__main__':
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
app.run(debug=True)四、系统优化与扩展
4.1<input type="file" id="resumeUpload" accept=".pdf">
性能优化策略
[*]异步处理:使用Celery处理耗时任务;
[*]缓存机制:Redis缓存常用解析结果;
[*]模型量化:使用spacy-transformers转换模型。
4.2<input type="file" id="resumeUpload" accept=".pdf">
功能扩展方向
[*]多语言支持:集成多语言模型;
[*]简历查重:实现SimHash算法检测重复;
[*]智能推荐:基于技能匹配岗位需求。
五、完整代码部署指南
5.1<input type="file" id="resumeUpload" accept=".pdf">
环境准备
#<input type="file" id="resumeUpload" accept=".pdf">
创建虚拟环境
python<input type="file" id="resumeUpload" accept=".pdf">
-m<input type="file" id="resumeUpload" accept=".pdf">
venv<input type="file" id="resumeUpload" accept=".pdf">
venv
source<input type="file" id="resumeUpload" accept=".pdf">
venv/bin/activate
<input type="file" id="resumeUpload" accept=".pdf">
#<input type="file" id="resumeUpload" accept=".pdf">
安装依赖
pip<input type="file" id="resumeUpload" accept=".pdf">
install<input type="file" id="resumeUpload" accept=".pdf">
flask<input type="file" id="resumeUpload" accept=".pdf">
spacy<input type="file" id="resumeUpload" accept=".pdf">
pdfplumber
python<input type="file" id="resumeUpload" accept=".pdf">
-m<input type="file" id="resumeUpload" accept=".pdf">
spacy<input type="file" id="resumeUpload" accept=".pdf">
download<input type="file" id="resumeUpload" accept=".pdf">
zh_core_web_sm5.2<input type="file" id="resumeUpload" accept=".pdf">
运行流程
[*]准备标注数据(至少50条);
[*]训练模型:python<input type="file" id="resumeUpload" accept=".pdf">
train_ner.py<input type="file" id="resumeUpload" accept=".pdf">
data.json<input type="file" id="resumeUpload" accept=".pdf">
output_model<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
;
[*]启动服务:python<input type="file" id="resumeUpload" accept=".pdf">
app.py<input type="file" id="resumeUpload" accept=".pdf">
。
[*]前端调用示例:
<input type="file" id="resumeUpload" accept=".pdf">
六、常见问题解决方案
6.1<input type="file" id="resumeUpload" accept=".pdf">
PDF解析失败
[*]检查文件是否为扫描件(需OCR处理);
[*]尝试不同解析引擎:
#<input type="file" id="resumeUpload" accept=".pdf">
使用布局分析with<input type="file" id="resumeUpload" accept=".pdf">
pdfplumber.open(pdf_path)<input type="file" id="resumeUpload" accept=".pdf">
as<input type="file" id="resumeUpload" accept=".pdf">
pdf:<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
page<input type="file" id="resumeUpload" accept=".pdf">
=<input type="file" id="resumeUpload" accept=".pdf">
pdf.pages<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
<input type="file" id="resumeUpload" accept=".pdf">
text<input type="file" id="resumeUpload" accept=".pdf">
=<input type="file" id="resumeUpload" accept=".pdf">
page.extract_text(layout=True)6.2<input type="file" id="resumeUpload" accept=".pdf">
实体识别准确率不足
[*]增加标注数据量(建议至少500条);
[*]使用主动学习方法优化标注;
[*]尝试迁移学习:
#<input type="file" id="resumeUpload" accept=".pdf">
使用预训练模型微调nlp<input type="file" id="resumeUpload" accept=".pdf">
=<input type="file" id="resumeUpload" accept=".pdf">
spacy.load("zh_core_web_trf")七、结语与展望
本教程构建了从PDF解析到Web服务的完整流程,实际生产环境中需考虑:分布式处理、模型持续训练、安全审计等要素。随着大语言模型的发展,未来可集成LLM实现更复杂的信息推理,例如从项目经历中推断候选人能力图谱。
通过本项目实践,开发者可以掌握:
[*]NLP工程化全流程;
[*]PDF解析最佳实践;
[*]Web服务API设计;
[*]模型训练与调优方法;
建议从简单场景入手,逐步迭代优化,最终构建符合业务需求的智能简历解析系统。
来源:程序园用户自行投稿发布,如果侵权,请联系站长删除
免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作!
页:
[1]