SRL的处理过程能否进一步优化？

AliBug · October 8, 2021, 12:03pm

例句：杨洁篪与沙利文本周会面，是拜登上台后双方官员再次会晤，但先前会面并无太多实质进展。
结果如下:

其中蓝框内文本没有更细粒度SRL的解析结果，红框内文本没有SRL解析结果。

将蓝框内的文字单独分析，可得结果如下，也就是说该句是可以有更细粒度结果的：
屏幕快照 2021-10-08 下午7.49.01

对红框内文字单独分析，也可以得到结果：
屏幕快照 2021-10-08 下午7.50.30

上述类似情况在对语料进行分析的时候，经常出现，是否存在进一步提升的空间

如果暂时没有办法，在现有结果下，可不可以这样：
把蓝框和红框内 分词和词性标注得到的tokens 单独再走一遍SRL标记？

另外，红框内文字去掉 “太多” 二字，则蓝框处文字是可以得到较细结果的

hankcs · October 8, 2021, 4:12pm

几乎所有的NLP模型在长句上表现都更差，SRL整体准确率约80%，在长句上可能只有70%。
这个问题的本质还是因为目前SRL抛弃了成分句法分析，不太能处理这种递归的结构。估计 @yzhangcs 可能感兴趣研究把constituency找回来的方法，拭目以待吧。
另外，中文语料库的建设还是远远落后英文的。这个句子翻译成英文后的分析效果比中文好：

Tok        	SRL PA1     	Tok        	SRL PA2     	Tok        	SRL PA3 
───────────	────────────	───────────	────────────	───────────	────────
Yang       	◄─┐         	Yang       	            	Yang       	        
Jiechi     	  │         	Jiechi     	            	Jiechi     	        
and        	  ├►ARG0    	and        	            	and        	        
Sullivan   	◄─┘         	Sullivan   	            	Sullivan   	        
met        	╟──►PRED    	met        	            	met        	        
this       	◄─┐         	this       	            	this       	        
week       	◄─┴►ARGM-TMP	week       	            	week       	        
,          	            	,          	            	,          	        
another    	            	another    	            	another    	        
meeting    	            	meeting    	            	meeting    	        
between    	            	between    	            	between    	        
officials  	            	officials  	            	officials  	        
from       	            	from       	            	from       	        
both       	            	both       	            	both       	        
sides      	            	sides      	            	sides      	        
since      	            	since      	            	since      	        
Biden      	            	Biden      	            	Biden      	───►ARG0
took       	            	took       	            	took       	╟──►PRED
office     	            	office     	            	office     	───►ARG1
,          	            	,          	            	,          	        
but        	            	but        	            	but        	        
there      	            	there      	            	there      	        
was        	            	was        	╟──►PRED    	was        	        
not        	            	not        	───►ARGM-NEG	not        	        
much       	            	much       	◄─┐         	much       	        
substantive	            	substantive	  ├►ARG1    	substantive	        
progress   	            	progress   	◄─┘         	progress   	        
in         	            	in         	◄─┐         	in         	        
their      	            	their      	  │         	their      	        
previous   	            	previous   	  ├►ARGM-LOC	previous   	        
meetings   	            	meetings   	◄─┘         	meetings   	        
.          	            	.          	            	.

最后SRL已经是“夕阳产业”了，AMR才是未来。

AliBug · October 9, 2021, 12:50am

期待能早日使用上AMR