- 几乎所有的NLP模型在长句上表现都更差,SRL整体准确率约80%,在长句上可能只有70%。
- 这个问题的本质还是因为目前SRL抛弃了成分句法分析,不太能处理这种递归的结构。估计 @yzhangcs 可能感兴趣研究把constituency找回来的方法,拭目以待吧。
- 另外,中文语料库的建设还是远远落后英文的。这个句子翻译成英文后的分析效果比中文好:
Tok SRL PA1 Tok SRL PA2 Tok SRL PA3
─────────── ──────────── ─────────── ──────────── ─────────── ────────
Yang ◄─┐ Yang Yang
Jiechi │ Jiechi Jiechi
and ├►ARG0 and and
Sullivan ◄─┘ Sullivan Sullivan
met ╟──►PRED met met
this ◄─┐ this this
week ◄─┴►ARGM-TMP week week
, , ,
another another another
meeting meeting meeting
between between between
officials officials officials
from from from
both both both
sides sides sides
since since since
Biden Biden Biden ───►ARG0
took took took ╟──►PRED
office office office ───►ARG1
, , ,
but but but
there there there
was was ╟──►PRED was
not not ───►ARGM-NEG not
much much ◄─┐ much
substantive substantive ├►ARG1 substantive
progress progress ◄─┘ progress
in in ◄─┐ in
their their │ their
previous previous ├►ARGM-LOC previous
meetings meetings ◄─┘ meetings
. . .
最后SRL已经是“夕阳产业”了,AMR才是未来。