国产精品小视频免费无限app ,91麻豆精品国产,亚洲精品国产精品国自产观看

本篇文章給大家帶來了關于python的相關知識，其中主要介紹了關于簡歷篩選的相關問題，包括了定義 ReadDoc 類用以讀取 word 文件以及定義 search_word 函數用以篩選的相關內容，下面一起來看一下，希望對大家有幫助。

Python自動化實踐之篩選簡歷

推薦學習：python視頻教程

簡歷篩選

簡歷相關信息如下：

Python自動化實踐之篩選簡歷

定義 ReadDoc 類用以讀取 word 文件

已知條件：

想要查找包含指定關鍵字的簡歷（比如 Python、Java）

實現思路：

批量讀取每一個 word 文件（通過 glob 獲取 word 信息），將他們的所有可讀內容獲取，并通過關鍵字方式篩選，拿到目標簡歷地址。

這里有個需要注意的地方就是，并不是所有的 "簡歷" 都是以段落的形式呈現的，比如從 "獵聘" 網下載下來的簡歷就是 "表格形式" 的，而 "boss" 上下載的簡歷就是 "段落形式" 的，這里再進行讀取的時候需要注意下，我們做的演示腳本練習就是 "表格形式" 的。

這里的話，我們就可以專門定義一個 "ReadDoc" 的類，里面定義兩個函數，分別用于讀取 "段落" 和 "表格" 。

實操案例腳本如下：

# coding:utf-8from docx import Documentclass ReadDoc(object):              # 定義一個 ReadDoc ，用以讀取 word 文件     def __init__(self, path):       # 構造函數默認傳入讀取 word 文件的路徑         self.doc = Document(path)         self.p_text = ''         self.table_text = ''          self.get_para()         self.get_table()       def get_para(self):             # 定義 get_para 函數用以讀取 word 文件的段落         for p in self.doc.paragraphs:             self.p_text += p.text + 'n'    # 讀取的段落內容進行換行         print(self.p_text)       def get_table(self):            # 定義 get_table 函數循環讀取表格內容         for table in self.doc.tables:             for row in table.rows:                 _cell_str = ''      # 獲取每一行的完整信息                 for cell in row.cells:                     _cell_str += cell.text + ','    # 每一行加一個 "," 隔開                 self.table_text += _cell_str + 'n'     # 讀取的表格內容進行換行         print(self.table_text)if __name__ == '__main__':     path = glob.os.path.join(glob.os.getcwd(), 'test_file/簡歷1.docx')     doc = ReadDoc(path)     print(doc)

看一下 ReadDoc 類的運行結果

Python自動化實踐之篩選簡歷

定義 search_word 函數用以篩選 word 文件內容符合想要的簡歷

OK，上文已經成功讀取了簡歷的 word 文檔，接下來我們要將讀取到的內容通過帥選關鍵字信息的方式，過濾出包含有關鍵字的簡歷。

實操案例腳本如下：

# coding:utf-8import globfrom docx import Documentclass ReadDoc(object):              # 定義一個 ReadDoc ，用以讀取 word 文件     def __init__(self, path):       # 構造函數默認傳入讀取 word 文件的路徑         self.doc = Document(path)         self.p_text = ''         self.table_text = ''          self.get_para()         self.get_table()       def get_para(self):             # 定義 get_para 函數用以讀取 word 文件的段落         for p in self.doc.paragraphs:             self.p_text += p.text + 'n'    # 讀取的段落內容進行換行         # print(self.p_text)        # 調試打印輸出 word 文件的段落內容       def get_table(self):            # 定義 get_table 函數循環讀取表格內容         for table in self.doc.tables:             for row in table.rows:                 _cell_str = ''      # 獲取每一行的完整信息                 for cell in row.cells:                     _cell_str += cell.text + ','    # 每一行加一個 "," 隔開                 self.table_text += _cell_str + 'n'     # 讀取的表格內容進行換行         # print(self.table_text)    # 調試打印輸出 word 文件的表格內容def search_word(path, targets):     # 定義 search_word 用以篩選符合內容的簡歷；傳入 path 與 targets（targets 為列表）     result = glob.glob(path)     final_result = []               # 定義一個空列表，用以后續存儲文件的信息      for i in result:             # for 循環獲取 result 內容          isuse = True                # 是否可用          if glob.os.path.isfile(i):       # 判斷是否是文件             if i.endswith('.docx'):      # 判斷文件后綴是否是 "docx" ，若是，則利用 ReadDoc類 實例化該文件對象                 doc = ReadDoc(i)                 p_text = doc.p_text         # 獲取 word 文件內容                 table_text = doc.table_text                 all_text = p_text + table_text                for target in targets:      # for 循環判斷關鍵字信息內容是否存在                     if target not in all_text:                         isuse = False                         break                  if not isuse:                     continue                 final_result.append(i)     return final_resultif __name__ == '__main__':     path = glob.os.path.join(glob.os.getcwd(), '*')     result = search_word(path, ['python', 'golang', 'react', '埋點'])      # 埋點是為了演示效果，故意在 "簡歷1.docx" 加上的     print(result)

運行結果如下：

Python自動化實踐之篩選簡歷

推薦學習：python視頻教程

一	二	三	四	五	六	日
« 6月
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Python自動化實踐之篩選簡歷

簡歷篩選

定義 ReadDoc 類用以讀取 word 文件

定義 search_word 函數用以篩選 word 文件內容符合想要的簡歷

相關推薦

熱門標簽

近期文章