• •    下一篇

基于深度强化学习的湿法脱硫系统运行优化

吴磊, 康英伟   

  1. 上海电力大学自动化工程学院, 上海 200090
  • 收稿日期:2021-06-09 修回日期:2022-01-09 出版日期:2022-05-25 发布日期:2022-07-23
  • 基金资助:
    国家自然科学基金项目(61573239),上海发电过程智能管控工程技术研究中心(14DZ2251100)资助课题.

吴磊, 康英伟. 基于深度强化学习的湿法脱硫系统运行优化[J]. 系统科学与数学, 2022, 42(5): 1067-1087.

WU Lei, KANG Yingwei. Operation Optimization of Wet Desulfurization System Based on Deep Reinforcement Learning[J]. Journal of Systems Science and Mathematical Sciences, 2022, 42(5): 1067-1087.

Operation Optimization of Wet Desulfurization System Based on Deep Reinforcement Learning

WU Lei, KANG Yingwei   

  1. School of Automation Engineering, Shanghai University of Electric Power, Shanghai 200090
  • Received:2021-06-09 Revised:2022-01-09 Online:2022-05-25 Published:2022-07-23
针对传统石灰石/石膏湿法烟气脱硫(WFGD)系统运行优化方式适应性不强,效率低,资源消耗大等问题,提出了一种基于数据驱动建模和深度强化学习的脱硫系统运行优化方法.首先为解决传统PCA只能衡量特征变量间线性关系的局限性,将互信息(MI)引入PCA中优化主成分分析结果和长短期记忆网络(LSTM)的输入变量;然后利用改进粒子群(IPSO)确定LSTM的最优参数组合,降低LSTM训练成本;最后基于MIPCA-IPSO-LSTM模型构建脱硫系统与强化学习的快速交互环境.考虑到传统深度确定性策略梯度(DDPG)算法存在收敛速度比较慢,训练不稳定耗时长,样本利用效率低的问题,文章提出采用基于累计回报的双经验池回放机制的深度确定性策略梯度(DER-DDPG)算法搭建优化仿真平台.文章以某电厂600MW机组脱硫系统为例,基于Python语言和TensorFlow框架下的仿真结果表明,与传统PCA相比,MIPCA能够保留更多原始数据信息并剔除冗余信息;IPSO可以提高PSO的全局寻优能力和收敛速度,与其他传统模型相比,当LSTM具有2层隐含层时具有更高的预测性能;DER-DDPG算法得出的优化策略在满足脱硫系统实际工艺参数需要的前提下,有效地降低了脱硫的运行成本,相比DQN算法和DDPG算法更具实际应用价值,能满足脱硫系统运行优化的需要.
Aiming at the problems of weak adaptability, low efficiency and large resource consumption of traditional limestone/gypsum wet flue gas desulfurization (WFGD) system, a desulfurization system operation optimization method based on data-driven modeling and deep reinforcement learning is proposed. Firstly, in order to solve the limitation that traditional PCA can only measure the linear relationship between characteristic variables, mutual information (MI) is introduced into PCA to optimize the results of principal component analysis and the input variables of long-term and short-term memory network (LSTM); Then the improved particle swarm optimization (IPSO) is used to determine the optimal parameter combination of LSTM to reduce the training cost of LSTM; Finally, a fast interactive environment between desulfurization system and reinforcement learning is constructed based on MIPCA-IPSO-LSTM model. Considering the problems of slow convergence, unstable training, long time-consuming and low sample utilization efficiency of the traditional depth deterministic strategy gradient (DDPG) algorithm, this paper proposes to build an optimization simulation platform by using the depth deterministic strategy gradient (DER-DDPG) algorithm based on the cumulative return double experience pool playback mechanism. Taking the desulfurization system of 600MW unit of a power plant as an example, the simulation results based on Python language and TensorFlow framework show that MIPCA can retain more original data information and eliminate redundant information compared with traditional PCA; Ipso can improve the global optimization ability and convergence speed of PSO. Compared with other traditional models, when LSTM has two hidden layers, it has higher prediction performance; The optimization strategy obtained by DER-DDPG algorithm effectively reduces the operation cost of desulfurization on the premise of meeting the actual process parameters of desulfurization system. Compared with DQN algorithm and DDPG algorithm, it has more practical application value and can meet the needs of operation optimization of desulfurization system.

MR(2010)主题分类: 

()
[1] 陈尔鲁.湿法烟气脱硫过程建模与优化.硕士论文.浙江大学,杭州, 2016.(Chen E L. Modeling and optimization of wet flue gas desulfurization process. Master Thesis. Zhejiang University, Hang Zhou, 2016.)
[2] Graus W, Worrell E. Effects of SO2 and Nox control on energy-efficiency power generation. Energy Policy, 2007, 35(7):3898-3908.
[3] 徐钢,袁星,杨勇平,等.火电机组烟气脱硫系统的节能优化运行.中国电机工程学报, 2012, 32(32):5, 22-29.(Xu G, Yuan X, Yang Y P, et al. Energy-saving and optimized operation of flue gas desulfurization system for thermal power plants. Proceedings of the Chinese Society of Electrical Engineering, 2012, 32(32):5, 22-29.)
[4] Zou R, Luo G, Fang C, et al. Modeling study of selenium migration behavior in wet flue gas desulfurization spray towers. Environmental Science and Technology, 2020, 54(24), DOI:10.1021/acs.est.0c04700.
[5] Warych J, Szymanowski M. Model of the wet limestone flue gas desulfurization process for cost optimization. Ind. Eng. Chem. Res., 2001, 40(12):2597-2605.
[6] Qiao Z, Wang X, Gu H, et al. An investigation on data mining and operating optimization for wet flue gas desulfurization systems. Fuel, 2019, 258:116178.
[7] Liu S, Sun L, Zhu S, et al. Operation strategy optimization of desulfurization system based on data mining. Applied Mathematical Modelling, 2020, 81:144-158.
[8] Vazquez-canteli J R, Nagy Z. Reinforcement learning for demand response:A review of algorithms and modeling techniques. Applied Energy, 2019, 235(FEB.1):1072-1089.
[9] Lee C, Wang M, Yen S, et al. Human vs. Computer go:Review and prospect[discussion forum]. IEEE Computational Intelligence Magazine, 11(3):67-72.
[10] 刘全,翟建伟,章宗长,等.深度强化学习综述.计算机学报, 2018, 41(1):1-27.(Liu Q, Zhai J W, Zhang Z C, et al. Overview of deep reinforcement learning. Chinese Journal of Computers, 2018, 41(1):1-27.)
[11] Lei W, Wen H, Wu J, et al. MADDPG-based security situational awareness for smart grid with intelligent edge. Applied Sciences, 2021, 11(7):3101.
[12] 龚锦霞,刘艳敏.基于深度确定策略梯度算法的主动配电网协调优化.电力系统自动化, 2020, 44(6):113-120.(Gong J X, Liu Y M. Coordination and optimization of active distribution network based on depth determination strategy gradient algorithm. Automation of Electric Power Systems, 2020, 44(6):113-120.)
[13] 王丙琛,司怀伟,谭国真.基于深度强化学习的自动驾驶车控制算法研究.郑州大学学报(工学版), 2020, 41(4):41-45, 80.(Wang B C, Si H W, Tan G Z. Research on autonomous vehicle control algorithm based on deep reinforcement learning. Journal of Zhengzhou University (Engineering Science Edition), 2020, 41(4):41-45, 80.)
[14] Choi S Y, Le T, Nguyen Q, et al. Toward self-driving bicycles using state-of-the-art deep reinforcement learning algorithms. Symmetry, 2019, 11(2), DOI:10.3390/sym11020290.
[15] 吴晓光,刘绍维,杨磊,等.基于深度强化学习的双足机器人斜坡步态控制方法.自动化学报, 2021, 47(8):1976-1987.(Wu X G, Liu S W, Yang L, et al. Slope gait control method of biped robot based on deep reinforcement learning. Chinese Journal of Automation, 2021, 47(8):1976-1987.)
[16] Wang D, Deng H B. Multirobot coordination with deep reinforcement learning in complex environments. Expert Systems with Applications, 2021, 180:115-128.
[17] 李琦,韩冰城.基于深度确定性策略梯度的热力站一次侧优化控制.科学技术与工程, 2019, 19(29):193-200.(Li Q, Han B C. Optimization control of primary side of thermal power station based on depth deterministic strategy gradient. Science Technology and Engineering, 2019, 19(29):193-200.)
[18] 闫军威,黄琪,周璇.基于DDPG的冷源系统节能优化控制策略.控制与决策, 2021, 36(12):2955-2963.(Yan J W, Huang Q, Zhou X. DDPG-based energy-saving optimization control strategy of cold source system. Control and Decision, 2021, 36(12):2955-2963.)
[19] Yan D A, Hz B, Ok B, et al. Intelligent multi-zone residential HVAC control strategy based on deep reinforcement learning. Applied Energy, 2021, 281:116117.
[20] 蒋明#
[17],成贵学,赵晋斌.基于改进DDPG的多能园区典型日调度研究.电网技术, https://doi.org/10.13335/j.1000-3673.pst.2021.0998.(Jiang M Z, Cheng G X, Zhao J B. Research on typical daily dispatching of multi energy park based on improved DDPG. Power Grid Technology, https://doi.org/10.13335/j.1000-3673.pst.2021.0998.)
[21] 金秀章,李京.基于互信息PSO-LSTM的SO$_2$浓度预测.控制工程, https://doi.org/10.14107/j.cnki. kzgc.20200606.(Jin X Z, Li J. SO2 concentration prediction based on mutual information PSO-LSTM. Control Engineering, https://doi. org/10.14107/j.cnki.kzgc.20200606.)
[22] 王肖锋,陆程昊,郦金祥,等.广义余弦二维主成分分析.自动化学报,https://doi.org/10.16383/j.aas. c190392.(Wang X F, Lu C H, Li J X, et al. Two-dimensional principal component analysis of generalized cosine. Acta Automatica Sinica, https://doi.org/10.16383/j.aas.c190392.)
[23] 刘文慧,徐遵义,张旭冉,等.基于互信息和PCA理论的湿法烟气脱硫工况特征提取方法.中国电力2020, 53(8):158-163.(Liu W H, Xu Z Y, Zhang X R, et al. Feature extraction method for wet flue gas desulfurization conditions based on mutual information and PCA theory. China Electric Power, 2020, 53(8):158-163.)
[24] 金秀章,刘岳,于静,等.基于变量选择和EMD-LSTM网络的出口SO$_2$浓度预测.中国电机工程学报, 2021, 41(24):8475-8484.(Jin X Z, Liu Y, Yu J, et al. Export SO2 concentration prediction based on variable selection and EMD-LSTM network. Proceedings of the Chinese Society of Electrical Engineering, 2021, 41(24):8475-8484.)
[25] 胡世哲,娄铮铮,王若彬,等.一种双重加权的多视角聚类方法.计算机学报, 2020, 43(9):1708-1720.(Hu S Z, Lou Z Z, Wang R B, et al. A dual-weighted multi-view clustering method. Chinese Journal of Computers, 2020, 43(9):1708-1720.)
[26] Hochreiter S, Schmidhuber J. Long short-term memory. Neural Computation, 1997, 9(8):1735-1780.
[27] 侯公羽,许哲东,刘欣,等.无数学模型的非线性约束单目标系统优化方法改进.工程科学学报, 2018, 40(11):1402-1411.(Hou G Y, Xu Z D, Liu X, et al. Improvement of optimization method for single-objective system with nonlinear constraints without mathematical model. Chinese Journal of Engineering Science, 2018, 40(11):1402-1411.)
[28] Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning. Nature, 2015, 518(7540):529-533.
[29] Silver D, Lever G, Heess N, et al. Deterministic policy gradient algorithms. Pmlr, 2014, 387-395.
[30] Lillicrap T P, Hunt J J, Pritzel A, et al. Continuous control with deep reinforcement learning. Computer Ence, 2015.
[31] 禾志强,祁利明,周鹏,等.石灰石-石膏湿法烟气脱硫运行优化.北京:中国电力出版社, 2011.(He Z Q, Qi L M, Zhou P, et al. Operation Optimization of Limestone-Gypsum Wet Flue Gas Desulfurization. Beijing:China Electric Power Press, 2011.)
[1] 谢佩军, 高婷婷, 叶宏武. 量子粒子群优化核极限学习机的船舶变压器故障诊断[J]. 系统科学与数学, 2021, 41(7): 1807-1816.
[2] 王维, 吴亮红, 刘振族, 李坚, 贾睿, 张红强. 基于邻域自适应粒子群优化算法的地表水源热泵机组优化调度[J]. 系统科学与数学, 2021, 41(6): 1520-1532.
[3] 李斌, 杨豪中, 甘旭升, 李琦. 改进PSO算法融合人工势场法的工业机器人路径规划设计[J]. 系统科学与数学, 2021, 41(4): 939-952.
[4] 侯胜杰, 关忠诚, 董雪璠. 基于熵和CVaR的多目标投资组合模型及实证研究[J]. 系统科学与数学, 2021, 41(3): 640-652.
[5] 王兴趣, 贾世会, 迟晓妮. 广义加权鲁棒主成分分析(GWRPCA)的模型与算法[J]. 系统科学与数学, 2021, 41(12): 3363-3373.
[6] 陆文星,戴一茹,李楚,李克卿. 基于改进PSO-BP神经网络的旅游客流量预测方法[J]. 系统科学与数学, 2020, 40(8): 1407-1419.
[7] 楚坤,张春雨,陈雷,曹强. 基于故障树和混沌粒子群算法的锻压机床故障诊断方法[J]. 系统科学与数学, 2020, 40(1): 180-190.
[8] 刘坚,唐美林,颜李朝. 中国可转换债券的赎回公告效应及其影响因素研究[J]. 系统科学与数学, 2019, 39(3): 425-436.
[9] 程美英,钱乾,倪志伟,朱旭辉. 基于虚拟多任务二元粒子群算法和分形维数的雾霾天气预测方法[J]. 系统科学与数学, 2018, 38(5): 623-637.
[10] 唐振鹏,黄双双,陈尾虹. 基于支持向量机的银行系统重要性评估研究[J]. 系统科学与数学, 2018, 38(1): 57-77.
[11] 李浩君,刘中锋,李赛,王万良. 基于改进二进制粒子群算法的个性化网络学习资源推荐方法[J]. 系统科学与数学, 2017, 37(8): 1770-1779.
[12] 刘超,宋欢. 基于粒子进化的SD多功效流率函数及其应用[J]. 系统科学与数学, 2017, 37(7): 1681-1691.
[13] 余平,杜江,张忠占. 部分函数型线性可加分位数回归模型[J]. 系统科学与数学, 2017, 37(5): 1335-1350.
[14] 袁文燕,王健,吴军,李健. 危险化学品车辆路径问题的一个新模型及算法研究[J]. 系统科学与数学, 2017, 37(2): 393-406.
[15] 杨丰梅,桂琳,袁文燕,李健. 考虑资金约束时易腐品二级供应链中延迟支付策略研究[J]. 系统科学与数学, 2017, 37(2): 473-490.
阅读次数
全文


摘要