SRL的处理过程能否进一步优化?

例句:杨洁篪与沙利文本周会面,是拜登上台后双方官员再次会晤,但先前会面并无太多实质进展。
结果如下:

其中蓝框内文本没有更细粒度SRL的解析结果,红框内文本没有SRL解析结果。


将蓝框内的文字单独分析,可得结果如下,也就是说该句是可以有更细粒度结果的:
屏幕快照 2021-10-08 下午7.49.01


对红框内文字单独分析,也可以得到结果:
屏幕快照 2021-10-08 下午7.50.30


上述类似情况在对语料进行分析的时候,经常出现,是否存在进一步提升的空间:grinning_face_with_smiling_eyes::question:


如果暂时没有办法,在现有结果下,可不可以这样:
把蓝框和红框内 分词和词性标注得到的tokens 单独再走一遍SRL标记?


另外,红框内文字去掉 “太多” 二字,则蓝框处文字是可以得到较细结果的 :sweat_smile:

  1. 几乎所有的NLP模型在长句上表现都更差,SRL整体准确率约80%,在长句上可能只有70%。
  2. 这个问题的本质还是因为目前SRL抛弃了成分句法分析,不太能处理这种递归的结构。估计 @yzhangcs 可能感兴趣研究把constituency找回来的方法,拭目以待吧。
  3. 另外,中文语料库的建设还是远远落后英文的。这个句子翻译成英文后的分析效果比中文好:
Tok        	SRL PA1     	Tok        	SRL PA2     	Tok        	SRL PA3 
───────────	────────────	───────────	────────────	───────────	────────
Yang       	◄─┐         	Yang       	            	Yang       	        
Jiechi     	  │         	Jiechi     	            	Jiechi     	        
and        	  ├►ARG0    	and        	            	and        	        
Sullivan   	◄─┘         	Sullivan   	            	Sullivan   	        
met        	╟──►PRED    	met        	            	met        	        
this       	◄─┐         	this       	            	this       	        
week       	◄─┴►ARGM-TMP	week       	            	week       	        
,          	            	,          	            	,          	        
another    	            	another    	            	another    	        
meeting    	            	meeting    	            	meeting    	        
between    	            	between    	            	between    	        
officials  	            	officials  	            	officials  	        
from       	            	from       	            	from       	        
both       	            	both       	            	both       	        
sides      	            	sides      	            	sides      	        
since      	            	since      	            	since      	        
Biden      	            	Biden      	            	Biden      	───►ARG0
took       	            	took       	            	took       	╟──►PRED
office     	            	office     	            	office     	───►ARG1
,          	            	,          	            	,          	        
but        	            	but        	            	but        	        
there      	            	there      	            	there      	        
was        	            	was        	╟──►PRED    	was        	        
not        	            	not        	───►ARGM-NEG	not        	        
much       	            	much       	◄─┐         	much       	        
substantive	            	substantive	  ├►ARG1    	substantive	        
progress   	            	progress   	◄─┘         	progress   	        
in         	            	in         	◄─┐         	in         	        
their      	            	their      	  │         	their      	        
previous   	            	previous   	  ├►ARGM-LOC	previous   	        
meetings   	            	meetings   	◄─┘         	meetings   	        
.          	            	.          	            	.          	        

最后SRL已经是“夕阳产业”了,AMR才是未来。

期待能早日使用上AMR :star_struck: