PorkCNN
Pork Barrel Classification Task for Taiwan Pork Barrel Context Using Convolutional Neural Networks 🐖🐖🐖
Enviroment Setting
- Python 3.8
- tensorflow>=2.4.4
- numpy
- scikit-learn
- numpy
- pandas
- lxml
- jieba
Notes
- Step-by-step tutorial for training machine learning workflow please finds here.
- K-folds validation can be found from here.
- The code for creating TensorBoard finds here.
-
TensorBoard for the measurements and visualizations during the machine learning workflow can be found from here.
- The training data available upon request.
Original Trianing Data (Labled Pork Barrel Legislation)
The collection of training data consists of 7243 pieces of legislation in total and containing 4852 training sets and 2391 test sets, respectively. The collection of legislation was manually labelled with binary-instance classification by Dr. Ching-Jyuhn Luor and his research team at National Taipei University. They created hand-labelled legislation by reading the text, devoted either to promoting the pork-barrel project in a district (earmarked projects) or cultivating favored minorities by providing subsidies to such as veterans association.
text pork_bill
0 軍人撫卹條例第十八條條文修正草案落實軍人及眷屬之照顧... 1
1 所得稅法第十七條條文修正草案學費之特別扣除額應以每人... 1
2 所得稅法第十一條條文修正草案保險人員申報時得扣除一定... 1
3 土地稅法第二十八條之一條文修正草案土地贈與文教基金會... 1
4 敬老福利生活津貼暫行條例第三條條文修正草案放寬請領資... 1
5 洗錢防制法部分條文修正草案給予法官較大權限;起訴期間... 0
6 日據時代日本政府國庫券及債券處理條例草案就是保障日據... 0
7 大陸地區人民來臺從事觀光活動條例草案開放大陸人民觀光... 0
8 限制欠稅人或欠稅營利事業負責人出境實施條例草案現行法... 0
9 使用牌照稅法第七條條文修正草案民營汽車駕駛人訓練機構... 1
Num of Train/ Test Split
Num of Train Set: 4852
Not Pork vs Pork: {0: 3167, 1: 1685}
Num of Test Set: 2391
Not Pork vs Pork: {0: 1566, 1: 825}
Model Building & Specification
Model: "PorkCNN"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding (Embedding) multiple 586600
_________________________________________________________________
conv1d (Conv1D) multiple 40100
_________________________________________________________________
conv1d_1 (Conv1D) multiple 40100
_________________________________________________________________
conv1d_2 (Conv1D) multiple 60100
_________________________________________________________________
conv1d_3 (Conv1D) multiple 60100
_________________________________________________________________
conv1d_4 (Conv1D) multiple 80100
_________________________________________________________________
conv1d_5 (Conv1D) multiple 100100
_________________________________________________________________
global_max_pooling1d (Global multiple 0
_________________________________________________________________
dense (Dense) multiple 179456
_________________________________________________________________
dropout (Dropout) multiple 0
_________________________________________________________________
dense_1 (Dense) multiple 257
=================================================================
Total params: 1,146,913
Trainable params: 1,146,913
Non-trainable params: 0
_________________________________________________________________
Evaluation & Classification
Training conntext: Number of Pork Legislation 2510; Number of None-Pork Legislation is 4733.
precision recall f1-score support
0 0.95 0.97 0.96 1566
1 0.94 0.91 0.92 825
accuracy 0.95 2391
macro avg 0.95 0.94 0.94 2391
weighted avg 0.95 0.95 0.95 2391
Prediction: Not Pork(0) Prediction: Pork(1)
Acutal: Not Pork(0) 1520 46
Acutal: Pork (1) 76 749
11/11 [==============================] - 3s 312ms/step - loss: 0.1779 - accuracy: 0.9477
[0.17792339622974396, 0.9477206468582153]
Learning Curves
Note: The mean training loss and accuracy measured over each epochs, and the validation loss and accuracy measured at the end of each.
Application on New Dataset (Parliamentary Questions 專案質詢 from 1993 - 2020)
Note: The figure shows the number of parliamentary questions each year.
Note: This figure shows the number of parliamentary questions identified as pork barrel feature by the CNN model across year.
Top 10 of 2000 Sampled 6th Parliamentary Questions (more likely to express particularistic policies)
Legislator | Pork/Constituency Interest | Questions | Topic | Key Word |
---|---|---|---|---|
陳啟昱 | 0.996769189834595 | 鑑於現行《所得稅法》第十七條規定特別扣除額教育支出部分,僅以納稅義務人之子女就讀大專院校為限… | Income tax; education expenses; deductions | Income Tax Law; Special Deductions; Educational Expenditure |
林正峰 | 0.995515823364258 | 針對政府準備修法推動「二代健保」,健保保費採取「年度所得總額」為計算基礎,而非採用扣除免稅額… | National; Health Insurance; Insurance Premium | Second-generation health insurance; total annual income |
彭添富 | 0.992780447006226 | 針對「辦理九十四年原住民中低收入戶家庭租屋補助計畫」專案補助計畫,特向行政院提出質詢。 | Aboriginal life | Aboriginal low- and middle-income households; housing subsidies |
李復興 | 0.992780089378357 | 發現自九十三年一月間起,勞保局陸續清查有一千多名國、公營事業退休員工溢領敬老津貼,截至93年… | Old-age benefits; labor retirement | Retired employees of public enterprises; over-receiving allowance for the elderly |
盧秀燕 | 0.992639720439911 | 針對早期退除役軍官給與補助金發放金額過低,實無法解決終身生活所需,希望相關單位考量實際情況,… | Veterans welfare | Grants for early retired officers |
李顯榮 | 0.990033149719238 | 對於陳水扁總統的「凱子外交」政策,不僅僅沒達到外交目的,更是浪費公帑,政府前後援賽金額高達5… | Foreign aid; farmer welfare | Triumphant diplomacy; old farmer allowance; Senegal |
丁守中 | 0.988385319709778 | 針就民眾陳情指出,目前政府對身心障礙者提供之生活津貼,依身心障礙程度等級分為1000元至50… | Welfare for the handicapped | Living allowance for the physically and mentally handicapped |
馮定國 | 0.985531985759735 | 鑒於國內經濟結構的快速調整,與人口高齡化的進展,未來中高齡失業問題必將日益嚴重,致使國人老年… | Elderly welfare | Aging; middle and old age unemployment |
彭添富 | 0.983698368072510 | 針對「豪雨成災,農作物損失補償」問題,特向行政院提出質詢。 | Agricultural subsidies | Heavy rain; crops |
曾華德 | 0.979519009590149 | 為民國38年至43年間戌守大陳島等地區之中華民國前江、浙、閩、粵反共救國軍補發薪餉問題,攸關… | Military pay | Anti-Communist Salvation Army Reimbursement of Salary |
林鴻池 | 0.978044390678406 | 針對諸多已獲得五年五百億與卓越計畫獎補助的大學擬調漲學雜費,但教育部長杜正勝曾承諾,獲得上述… | Education grant; education expenses; university | Project Excellence Award subsidy; university tuition |
王昱婷 | 0.976034879684448 | 針對根據內政部最新統計,國內的嬰兒出生率再創新低點,今年1到4月只有6萬5400個小嬰兒出生… | Fertility rate; population policy | Birth rate; fertility rate |
彭添富 | 0.974732995033264 | 針對「觀音鄉保生社區風貌營造規劃設計」專案補助計畫,特向行政院提出質詢。 | Community project; government subsidy | Baosheng community |
彭紹瑾 | 0.970751523971558 | 針對政府為提高生育率,有意將「育嬰假」放寬至全體勞工,並增加六個月的「育嬰留職停薪津貼」,此… | Women’s Welfare | Parental leave; leave without pay allowance |
吳志揚 | 0.968230128288269 | 針對政府打著照顧中產階級的漂亮旗號,擬調增受薪大眾的薪資特別扣除額,但是只在薪資扣除額調整幅… | Salary deduction | Special salary deduction |
Last 10 Rows of 2000 Samples (less likely to express particularistic policies)
Legislator | Pork/Constituency Interest | Legislative Questions | Topic | Key Word |
---|---|---|---|---|
李復甸 | 0.000021549063604 | 鑑於刑事偵察實務上緩起訴制度,有淪於檢察官為同案被告間不利證詞取得之交換手段之虞,破壞緩起訴… | Investigation; litigation procedure | Criminal investigation; secret witness |
林建榮 | 0.000020212990421 | 為立法院朝野協商修改銀行法,明定信用卡、現金卡循環利率與銀行公布的基本放款利率差距不得超過十… | Financial management; bank management | Banking Law; Cash Card; Revolving Interest Rate; Card Debt |
林正峰 | 0.000019731034627 | 針對行政院長張俊雄日前穿著輕便的長袖白襯衫,要求各級機關和學校身體力行節約能源,當場台下官員… | Energy policy | Energy saving |
林正峰 | 0.000019187420548 | 鑑於近年來臺灣地區毒品氾濫,吸毒人數劇增,危害國民身心健康甚鉅,因而滋生之犯罪更成為影響社會… | Tobacco Restriction; Hospital | Drug Abuse; Departmental Hospital; Special Agency for Drug Rehabilitation |
王幸男 | 0.000017634354663 | 針對道路人孔蓋或管線挖掘後回填品質不佳,或是公共安全沒有做好,導致各種傷害和死亡案件,一直居… | Public safety | Manhole cover; public safety; road quality |
管碧玲 | 0.000013002485503 | 針對近日台灣鐵路管理局發生網路訂票系統遭到內部人員惡意壟斷,導致一般民眾訂票權益受損之弊端;… | Railway management; ticket | Online booking; monopoly; Taiwan Railway |
黃敏惠 | 0.000011985112906 | 就近日來爆發知名提神飲料遭下毒事件,已知有四位台中市民因誤喝中毒,並有一人已不治死亡。此一類… | Drinks; Poisoning | Drinks; Poisoning |
陳朝龍 | 0.000011277100384 | 針對英國政府宣稱台灣出口至該國禽鳥,檢驗出感染禽流感H5N1病毒死亡。由於我國迄今並未發現有… | Infectious disease prevention and control; smuggling | British Government; Taiwanese birds; Avian Influenza; Smuggling |
林進興 | 0.000007685628589 | 針對行政院金融監督管理委員會為了解決國人廣大的卡債問題,積極推廣債務協商機制,政策本意良好,… | Credit card; debt | Card debt; negotiation mechanism |
賴清德 | 0.000006586606105 | 針對市售豆類製品疑含「過氧化氫」情形嚴重,傷害消費者健康,爰要求相關單位依食品衛生管理法切實… | Food Management | Soy Products; Hydrogen Peroxide |
邱毅 | 0.000006585601113 | 針對新聞局認定TVBS應為綜合台非新聞台乙案,準備將TVBS轉頻一事,日前行政院新聞局認定T… | Freedom of the press; TV station | Ownership Structure; Foreign Investment; Organic Law of the National Communications Commission; Freedom of Reporting |
羅世雄 | 0.000006475097052 | 針對手機通訊業者辦理新辦戶及更換SIM卡程序中,出現犯罪集團持偽造身分證,藉此竊取個人資料,… | Telecommunications administration; national identity card; privacy; forgery | Mobile phone; personal data theft; criminal group |
周守訓 | 0.000006256129382 | 針對日前媒體報導陳水扁總統宣示,十二月二十五日耶誕節是重要的宗教節慶,更是我國的行憲紀念日,… | Religion; Holiday | Christmas; Constitutional Anniversary; Religious Freedom; Separation of Church and State |
郭榮宗 | 0.000004869816621 | 對於板橋地方法院安全不設防?煙毒犯在地檢署廁所施打毒品,死亡兩天才被人發現,凸顯法警素質不足… | District Court; Drugs | Banqiao District Court; drug offenders; drug abuse |
潘孟安 | 0.000002590457370 | 就立法委員選舉,改採單一選區兩票制即將首度實施,中央選舉委員應加強宣導「單一選區兩票制」的新… | election | Legislative elections; two-vote system for a single constituency |
Use End-to-End Model
If there’s anything you need about the application and end-to-end use, please don’t hesitate to send me a message.
from tensorflow import keras
model = keras.models.load_model('PorkCNN\lour_pork_model')
Cite:
For citing this work, you can refer to the present GitHub project. For example, with BibTeX:
@misc{PorkCNN,
howpublished = {\url{https://github.com/davidycliao/PorkCNN}},
title = {A Small Project for Pork Barrel Legislation Classification Using Convolutional Neural Networks},
author = {David, Yen-Chieh Liao},
publisher = {GitHub},
year = {2021}
}