Few-shot Information Extraction: Pre-train, Prompt, Entail

Page 1

EAI - Northeastern University
Information Extraction Pre-train, Prompt, Entail Eneko Agirre Director of HiTZ Basque Center for Language Technology (UPV/EHU) @eagirre https://hitz.eus/eneko/
Few-shot

In collaboration with

IE: pre-train, prompt, entail – Eneko Agirre

Few-shot

In collaboration with Wearehiring!

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

Few-shot Information Extraction?

● Adoption of NLP in companies deterred because of high effort of domain experts

– In the case of Information Extraction, define non-trivial schemas with entities and relations of interest, annotate corpus, train supervised ML system –

Define, annotate, train

● Verbalize while defining, interactive workflow

Domain expert defines entities and relations in English

Runs the definitions on examples

Annotates a handful of incorrect examples, iterates

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
8

Few-shot Information Extraction?

● Adoption of NLP in companies deterred because of high effort of domain experts

– In the case of Information Extraction, define non-trivial schemas with entities and relations of interest, annotate corpus, train supervised ML system

Define, annotate, train

● Verbalize while defining, interactive workflow

PERSON: Each distinct person or set of people mentioned in a doc.

– Domain expert defines entities and relations in English

Named-entity

Classification (NEC)

Runs the definitions on examples

https://www.ldc.upenn.edu/sites/www.ldc.upenn.edu/files/english-entities-guidelines-v6.6.pdf

https://www.ldc.upenn.edu/sites/www.ldc.upenn.edu/files/english-events-guidelines-v5.4.3.pdf

Annotates a handful of incorrect examples, iterates

https://www.ldc.upenn.edu/sites/www.ldc.upenn.edu/files/english-relations-guidelines-v6.2.pdf

Few-shot IE:
Agirre
pre-train, prompt, entail – Eneko
9
GPE:
DATE:
ORG: ...
...
... NEC

Few-shot Information Extraction?

● Adoption of NLP in companies deterred because of high effort of domain experts

– In the case of Information Extraction, define non-trivial schemas with entities and relations of interest, annotate corpus, train supervised ML system

– Define, annotate, train

● Verbalize while defining, interactive workflow

– Domain expert defines entities and relations in English

Named-entity

Classification (NEC) Event Extraction (EE)

Runs the definitions on examples

https://www.ldc.upenn.edu/sites/www.ldc.upenn.edu/files/english-entities-guidelines-v6.6.pdf

https://www.ldc.upenn.edu/sites/www.ldc.upenn.edu/files/english-events-guidelines-v5.4.3.pdf

Annotates a handful of incorrect examples, iterates

https://www.ldc.upenn.edu/sites/www.ldc.upenn.edu/files/english-relations-guidelines-v6.2.pdf

Few-shot IE:
Eneko Agirre
pre-train, prompt, entail –
10 EVENT LIFE.DIE: A DIE Event occurs whenever the life of a PERSON Entity ends.
ERSON: Each distinct person or set of people mentioned in a doc. ORG: ... GPE: ... DATE: ... NEC
P

Few-shot Information Extraction?

– Define, annotate, train

P NEC

Each distinct person or set of people mentioned in a doc.

● Verbalize while defining, interactive workflow

– Domain expert defines entities and relations in English

Named-entity

– Runs the definitions on examples

https://www.ldc.upenn.edu/sites/www.ldc.upenn.edu/files/english-entities-guidelines-v6.6.pdf

https://www.ldc.upenn.edu/sites/www.ldc.upenn.edu/files/english-events-guidelines-v5.4.3.pdf

– Annotates a handful of incorrect examples, iterates

https://www.ldc.upenn.edu/sites/www.ldc.upenn.edu/files/english-relations-guidelines-v6.2.pdf

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
companies deterred
effort of domain experts
Adoption of NLP in
because of high
the case of Information Extraction, define non-trivial
with
and
supervised ML system
– In
schemas
entities
relations of interest, annotate corpus, train
11 RELATION EVENT LIFE.DIE: A DIE Event occurs whenever the life of a PERSON Entity ends. EMPLOYEEOF: Employment captures the relationship between Persons and their employers. This Relation is only taggable when it can be reasonably assumed that the PER is paid by the ORG or GPE.
ERSON:
Classification (NEC) Event Extraction (EE) Relation Extraction (RE) ORG: ... GPE: ... DATE: ...

https://www.ldc.upenn.edu/sites/www.ldc.upenn.edu/files/english-entities-guidelines-v6.6.pdf

https://www.ldc.upenn.edu/sites/www.ldc.upenn.edu/files/english-events-guidelines-v5.4.3.pdf

https://www.ldc.upenn.edu/sites/www.ldc.upenn.edu/files/english-relations-guidelines-v6.2.pdf

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
Few-shot Information Extraction?
Adoption of NLP in companies deterred because of high effort of domain experts
In the case of Information Extraction, define non-trivial schemas with entities and relations of interest, annotate corpus, train supervised ML system
Define, annotate, train
Verbalize while defining, interactive workflow
defines entities and relations in English
the
12 RELATION EVENT DATE: ... LIFE.DIE: A DIE Event occurs whenever the life of a PERSON Entity ends. EMPLOYEEOF: Employment captures the relationship between Persons and their employers. This Relation is only taggable when it can be reasonably assumed that the PER is paid by the ORG or GPE. VICTIM-ARG: The person(s) who died EVENT ARGUMENT PLACE-ARG: Where the death takes place
– Domain expert
– Runs
definitions on examples – Annotates a handful of incorrect examples, iterates
Extraction
Relation Extraction
Event Argument Extraction
ERSON:
ORG: ... GPE: ... DATE: ... NEC
Named-entity Classification (NEC) Event
(EE)
(RE)
(EAE) P
Each distinct person or set of people mentioned in a doc.

Few-shot Information Extraction?

● Adoption of NLP in companies deterred because of high effort of domain experts

In the case of Information Extraction, define non-trivial schemas with entities and relations of interest, annotate corpus, train supervised ML system –

Define, annotate, train

● Verbalize while defining, interactive workflow

Domain expert defines entities and relations in English

Runs the definitions on examples

John Smith, an executive at XYZ Co., died in Florida on Sunday.

Annotates a handful of incorrect examples, iterates

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
13

Few-shot Information Extraction?

● Adoption of NLP in companies deterred because of high effort of domain experts

– In the case of Information Extraction, define non-trivial schemas with entities and relations of interest, annotate corpus, train supervised ML system –

Define, annotate, train

● Verbalize while defining, interactive workflow

– Domain expert defines entities and relations in English

Runs the definitions on examples

John SmithPERSON, an executive at XYZ Co.ORGANIZATION, died in FloridaGPE on SundayDATE.

Annotates a handful of incorrect examples, iterates

Few-shot IE: pre-train,
Eneko Agirre
prompt, entail –
14
doc. ORG: ... GPE: ... DATE: ... NEC
PERSON: Each distinct person or set of people mentioned in a
Few-shot IE: pre-train, prompt, entail – Eneko Agirre
● Adoption of NLP in companies deterred because of high effort of domain experts
In the case of Information Extraction, define non-trivial schemas with entities and relations of interest, annotate corpus, train supervised ML system – Define, annotate, train
Verbalize while defining, interactive workflow
Domain expert defines entities and relations in English
the
15 EVENT PERSON: Each distinct person or set of people mentioned in a doc. ORG: ... GPE: ... DATE: ... LIFE.DIE: A DIE Event occurs whenever the life of a PERSON Entity ends. NEC John SmithPERSON, an executive at XYZ Co.ORGANIZATION, diedLIFE.DIE in FloridaGPE on SundayDATE.
Few-shot Information Extraction?
– Runs
definitions on examples – Annotates a handful of incorrect examples, iterates
Few-shot IE: pre-train, prompt, entail – Eneko Agirre
Information Extraction? ● Adoption of NLP in companies deterred because of high effort of domain experts – In the case of Information Extraction, define non-trivial schemas with entities and relations of interest, annotate corpus, train supervised ML system – Define, annotate, train ● Verbalize while defining, interactive workflow – Domain expert defines entities and relations in English – Runs the definitions on examples – Annotates a handful of incorrect examples, iterates 16 RELATION EVENT PERSON: Each distinct person or set of people mentioned in a doc. ORG: ... GPE: ... DATE: ... LIFE.DIE: A DIE Event occurs whenever the life of a PERSON Entity ends. EMPLOYEEOF: Employment captures the relationship between Persons and their employers. This Relation is only taggable when it can be reasonably assumed that the PER is paid by the ORG or GPE. NEC John SmithPERSON, an executive at XYZ Co.ORGANIZATION, diedLIFE.DIE in FloridaGPE on SundayDATE. EMPLOYEEOF
Few-shot
Few-shot IE: pre-train, prompt, entail – Eneko Agirre
Information Extraction? ● Adoption of NLP in companies deterred because of high effort of domain experts – In the case of Information Extraction, define non-trivial schemas with entities and relations of interest, annotate corpus, train supervised ML system – Define, annotate, train ● Verbalize while defining, interactive workflow – Domain expert defines entities and relations in English – Runs the definitions on examples – Annotates a handful of incorrect examples, iterates 17 RELATION EVENT PERSON: Each distinct person or set of people mentioned in a doc. ORG: ... GPE: ... DATE: ... LIFE.DIE: A DIE Event occurs whenever the life of a PERSON Entity ends. EMPLOYEEOF: Employment captures the relationship between Persons and their employers. This Relation is only taggable when it can be reasonably assumed that the PER is paid by the ORG or GPE. VICTIM-ARG: The person(s) who died EVENT ARGUMENT PLACE-ARG: Where the death takes place NEC John SmithPERSON, an executive at XYZ Co.ORGANIZATION, diedLIFE.DIE in FloridaGPE on SundayDATE. EMPLOYEEOF PLACE-ARG VICTIM-ARG TIME-ARG
Few-shot

Few-shot Information Extraction?

● Adoption of NLP in companies deterred because of high effort of domain experts

– In the case of Information Extraction, define non-trivial schemas with entities and relations of interest, annotate corpus, train supervised ML system

Define, annotate, train

● Verbalize while defining, interactive workflow

Domain expert defines entities and relations in English

Runs the definitions on examples

Annotates a handful of incorrect examples, iterates

mastro-h2020.eu/project-committees/

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
18

Few-shot Information Extraction?

Interactive workflow: verbalize while defining – Domain expert defines entities and relations in English

– Runs the definitions on examples

Annotates a handful of incorrect examples, iterates

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
20

Few-shot Information Extraction?

Interactive workflow: verbalize while defining – Domain expert defines entities and relations in English

– Runs the definitions on examples

Annotates a handful of incorrect examples, iterates

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
21 {X} is a person. → PERSON {X} is a date. → DATE NEC VERBALIZATIONS

Few-shot Information Extraction?

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
● Interactive workflow: verbalize while defining – Domain expert defines entities and relations in English
Runs the definitions on examples
a handful of incorrect examples, iterates 22 EVENT VERBALIZATIONS {X} is a person. → PERSON {X} is a date. → DATE {E} refers to a death. → LIFE.DIE NEC VERBALIZATIONS
– Annotates

Few-shot Information Extraction?

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
● Interactive workflow: verbalize while defining – Domain expert defines entities and relations in English – Runs the definitions on examples
a handful of incorrect examples, iterates 23 RELATION VERBALIZATIONS EVENT VERBALIZATIONS {X} is a person. → PERSON {X} is a date. → DATE {E} refers to a death. → LIFE.DIE {X} is employed by {Y} → EMPLOYEEOF NEC VERBALIZATIONS
– Annotates

Few-shot Information Extraction?

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
● Interactive workflow: verbalize while defining – Domain expert defines entities and relations in English – Runs the definitions on examples – Annotates a handful of incorrect examples, iterates 24 RELATION VERBALIZATIONS EVENT VERBALIZATIONS {X} is a person. → PERSON {X} is a date. → DATE {E} refers to a death. → LIFE.DIE {X} is employed by {Y} → EMPLOYEEOF {X} died. → VICTIM-ARG EVENT ARGUMENT VERBALIZATIONS Someone {E} in {X}. → PLACE-ARG NEC VERBALIZATIONS

Few-shot Information Extraction?

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
● Interactive workflow: verbalize while defining – Domain expert defines entities and relations in English – Runs the definitions on examples – Annotates a handful of incorrect examples, iterates 25 RELATION VERBALIZATIONS EVENT VERBALIZATIONS {X} is a person. → PERSON {X} is a date. → DATE {E} refers to a death. → LIFE.DIE {X} is employed by {Y} → EMPLOYEEOF {X} died. → VICTIM-ARG EVENT ARGUMENT VERBALIZATIONS Someone {E} in {X}. → PLACE-ARG NEC VERBALIZATIONS John Smith, an executive at XYZ Co., died in Florida on Sunday.

Few-shot

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
● Interactive workflow: verbalize while defining – Domain expert defines entities and relations in English – Runs the definitions on examples – Annotates a handful of incorrect examples, iterates 26 RELATION VERBALIZATIONS EVENT VERBALIZATIONS {X} is a person. → PERSON {X} is a date. → DATE {E} refers to a death. → LIFE.DIE {X} is employed by {Y} → EMPLOYEEOF {X} died. → VICTIM-ARG EVENT ARGUMENT VERBALIZATIONS Someone {E} in {X}. → PLACE-ARG NEC VERBALIZATIONS John SmithPERSON, an executive at XYZ Co.ORGANIZATION, diedLIFE.DIE in FloridaGPE on SundayDATE. EMPLOYEEOF PLACE-ARG VICTIM-ARG TIME-ARG
Information Extraction?

Few-shot

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
● Interactive workflow: verbalize while defining – Domain expert defines entities and relations in English – Runs the definitions on examples – Annotates a handful of incorrect examples, iterates 27 RELATION VERBALIZATIONS EVENT VERBALIZATIONS {X} is a person. → PERSON {X} is a date. → DATE {E} refers to a death. → LIFE.DIE {X} is employed by {Y} → EMPLOYEEOF {X} died. → VICTIM-ARG EVENT ARGUMENT VERBALIZATIONS Someone {E} in {X}. → PLACE-ARG NEC VERBALIZATIONS John SmithPERSON, an executive at XYZ Co.ORGANIZATION, diedLIFE.DIE in FloridaGPE on SundayDATE. EMPLOYEEOF PLACE-ARG VICTIM-ARG TIME-ARG
Information Extraction?

Few-shot

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
● Interactive workflow: verbalize while defining – Domain expert defines entities and relations in English – Runs the definitions on examples – Annotates a handful of incorrect examples, iterates 28 RELATION VERBALIZATIONS EVENT VERBALIZATIONS {X} is a person. → PERSON {X} is a date. → DATE {E} refers to a death. → LIFE.DIE {X} is employed by {Y} → EMPLOYEEOF {X} died. → VICTIM-ARG EVENT ARGUMENT VERBALIZATIONS Someone {E} in {X}. → PLACE-ARG NEC VERBALIZATIONS John SmithPERSON, an executive at XYZ Co.ORGANIZATION, diedLIFE.DIE in FloridaGPE on SundayDATE. EMPLOYEEOF PLACE-ARG VICTIM-ARG TIME-ARG
Information Extraction?

Few-shot Information Extraction?

Define, annotate, train vs. Interactive workflow: verbalize while defining

● 10 times more effective (time of domain experts)

● Friendlier for domain experts

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
30
freepik.com/ insider.com/

Few-shot Information Extraction?

Thanks to latest advances:

● Large pre-trained language models (LM)

● Recast IE into natural language instructions and prompts

But (even largest) LMs have limited inference ability

● Enhance inference abilities of LM with entailment datasets

● Recast IE as an entailment problem

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
31

Few-shot Information Extraction?

Thanks to latest advances:

● Large pre-trained language models (LM)

● Recast IE into natural language instructions and prompts

But (even largest) LMs have limited inference ability

● Enhance inference abilities of LM with entailment datasets

● Recast IE as an entailment problem

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
32

Plan for the talk

● Pre-trained Language Models

● Prompting

● Entailment

● Few-shot Information Extraction

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
34

Pre-trained Language Models

1) Self-supervised LM pre-training

Unlabelled data: HUGE corpora: Wikipedia, news, web crawl, social media, etc.

– Train some variant of a Language Model

2) Supervised pre-training

Very common in vision (ImageNet), standalone.

NLP in-conjuction with self-supervised LM, –

Task-specific: e.g. transfer from one Q&A dataset to another (REF) –

Entailment for improved inference (REF)

All available tasks (e.g. T0) (REF)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
36

Pre-trained Language Models

1) Self-supervised LM pre-training

Unlabelled data: HUGE corpora: Wikipedia, news, web crawl, social media, etc.

– Train some variant of a Language Model

2) Supervised pre-training

Very common in vision (ImageNet), standalone.

NLP in-conjuction with self-supervised LM. –

Task-specific: e.g. transfer from one Q&A dataset to another

Pivot task: e.g. entailment or Q&A

(e.g. Sainz et al. 2021; Wang et al. 2021)

All available tasks (e.g. T0, Sahn et al. 2021)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
37

Self-supervised LM pre-training

Informally, learn parameters ϴ using some variant of Pϴ(text | some other text)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
39

Self-supervised LM pre-training

Informally, learn parameters ϴ using some variant of Pϴ(text | some other text)

(Causal) Language Model (GPT) Masked Language Model (BERT)

Pre-Trained Models: Past, Present and Future (Han et al. 2021)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
40

Self-supervised LM pre-training

Informally, learn parameters ϴ using some variant of Pϴ(text | some other text)

(Causal) Language Model (GPT) Masked Language Model (BERT)

● Self-attention: left and right

● Loss: masked words

Pre-Trained Models: Past, Present and Future (Han et al. 2021)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
48

Self-supervised LM pre-training

blue = 20.60% red = 6.15% clear = 5.84% orange = 4.11% ...

● Self-attention: left and right

● Loss: masked words

● At inference it can fill explicitly masked tokens

clear

The sky is [MASK] . [SEP]

Pre-Trained Models: Past, Present and Future (Han et al. 2021)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre 49
[CLS]

Fine-tuning on a specific task

Sentence classification: Add a classification head on top of the [CLS] token

Sentiment Analysis

Training example: (The sky is fantastic,Positive)

Pre-Trained Models: Past, Present and Future (Han et al. 2021)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre 51

Fine-tuning on a specific task

Sentence classification: Add a classification head on top of the [CLS] token

Positive = 82%

Negative = 18%

Sentiment Analysis

[CLS] The sky is fantastic . [SEP]

Training example: (The sky is fantastic,Positive)

Pre-Trained Models: Past, Present and Future (Han et al. 2021)

Few-shot IE:
Eneko Agirre 52
pre-train, prompt, entail –

Why do Pre-trained LMs work so well?

● LM is a very difficult task, even for humans.

– LMs compress any possible context into a vector that generalizes over possible completions.

– Forced to learn syntax, semantics, encode facts about the world, etc.

● LM consume huge amounts of data

● The fine-tuning stage exploits the knowledge about language already in the LM, instead of starting from scratch

Few-shot IE: pre-train,
Eneko Agirre
prompt, entail –
55

Plan for this session

● Pre-trained LM

● Prompting

● Entailment

● Few-shot Information Extraction

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
56

What is prompt learning?

Rationale: Recast NLP tasks into natural language, so Pretrained Language Models can apply their knowledge about language and the world

Related ideas, zero-shot and few-shot

Learn a task with minimal task description: –

Instructions on what the task is –

Present task to LM as a prompt –

(few-shot) prepend handful of labeled examples

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
58

What is prompt learning?

Rationale: Recast NLP tasks into natural language, so Pretrained Language Models can apply their knowledge about language and the world

Related ideas: zero-shot and few-shot

Learn a task with minimal task description: –

Instructions on what the task is –

Present task to LM as a prompt –

(Few-shot) prepend handful of labeled examples

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
59

Sentiment analysis

Positive Negative

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
61
The sky is fantastic .

Sentiment analysis

Fine-tuned

Positive = 82%

Negative = 18%

[CLS] The sky is fantastic . [SEP]
LM

LM prompting (zero-shot)

MODEL

The sky is fantastic .

Language Models are Few-Shot Learners (Brown et al. 2020)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
63
Frozen MLM LANGUAGE

LM prompting (zero-shot)

MODEL

The sky is fantastic . It was [MASK] !

Language Models are Few-Shot Learners (Brown et al. 2020)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
64
Frozen MLM LANGUAGE

LM prompting (zero-shot)

LANGUAGE MODEL

The sky is fantastic . It was [MASK] !

P1=P(great | The sky is fantastic. It was [MASK] !)

P2=P(terrible | The sky is fantastic. It was [MASK] !)

P1 > P2 then Positive

Language Models are Few-Shot Learners (Brown et al. 2020)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
65
Frozen MLM

LM prompting (zero-shot)

great = 12% terrible = 4%

LANGUAGE MODEL

The sky is fantastic . It was [MASK] !

P1=P(great | The sky is fantastic. It was [MASK] !)

P2=P(terrible | The sky is fantastic. It was [MASK] !)

P1 > P2 then Positive

Language Models are Few-Shot Learners (Brown et al. 2020)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
66
Frozen MLM

LM prompting (few-shot)

In-context learning

Training Data

Text: I’m not sure I like it.

Label: Negative

Text: Thank you for the amazing help.

Label: Positive

Language Models are Few-Shot Learners (Brown et al. 2020)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
70

LM prompting (few-shot)

In-context learning

Training Data

Text: I’m not sure I like it.

Label: Negative

Text: Thank you for the amazing help.

Label: Positive

S1 = I’m not sure I like it. It was terrible!

S2 = Thank you for the amazing help. It was great!

S = The sky is fantastic. It was ____

Language Models are Few-Shot Learners (Brown et al. 2020)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
71

LM prompting (few-shot)

In-context learning

Training Data

Text: I’m not sure I like it.

Label: Negative

Text: Thank you for the amazing help.

Label: Positive

LANGUAGE MODEL

S1 = I’m not sure I like it. It was terrible!

S2 = Thank you for the amazing help. It was great!

S = The sky is fantastic. It was ____

Language Models are Few-Shot Learners (Brown et al. 2020)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
72

LM prompting (few-shot)

In-context learning

Training Data

Text: I’m not sure I like it.

Label: Negative

Text: Thank you for the amazing help.

Label: Positive

LANGUAGE MODEL

S1 = I’m not sure I like it. It was terrible!

S2 = Thank you for the amazing help. It was great!

S = The sky is fantastic. It was ____

P1=P(great | S1 \n S2 \n The sky is fantastic. It was )

P2=P(terrible | S1 \n S2 \n The sky is fantastic. It was )

P1 > P2 then Positive

Language Models are Few-Shot Learners (Brown et al. 2020)

Few-shot IE: pre-train, prompt, entail
Eneko Agirre
73

LM prompting (few-shot)

In-context learning

Training Data

Text: I’m not sure I like it.

Label: Negative

Text: Thank you for the amazing help.

Label: Positive

LANGUAGE MODEL

S1 = I’m not sure I like it. It was terrible!

S2 = Thank you for the amazing help. It was great!

S = The sky is fantastic. It was ____

P1=P(great | S1 \n S2 \n The sky is fantastic. It was )

P2=P(terrible | S1 \n S2 \n The sky is fantastic. It was )

P1 > P2 then Positive

Language Models are Few-Shot Learners (Brown et al. 2020)

Few-shot
IE: pre-train, prompt, entail – Eneko Agirre
74
NOPARAMETER
UPDATE

Domain-experts provide templates / label map

Template: [x] It was __ !

Label map: great <=> positive

I’m not sure I like it. It was terrible!

Thank you for the amazing help. It was great!

The sky is fantastic. It was ____

Template: Review: [x] Sentiment: __

Label map: positive <=> positive

Review: I’m not sure I like it. Sentiment: negative

Review: Thank you for the amazing help. Sentiment: positive

Review: The sky is fantastic. Sentiment: ____

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
75

Domain-experts provide templates / label map

Template: [x] It was __ !

Label map: great <=> positive

I’m not sure I like it. It was terrible!

Thank you for the amazing help. It was great!

The sky is fantastic. It was ____

Template: Review: [x] Sentiment: __

Label map: positive <=> positive

Review: I’m not sure I like it. Sentiment: negative

Review: Thank you for the amazing help. Sentiment: positive

Review: The sky is fantastic. Sentiment: ____

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
76

In-context learning

Few-shot IE: pre-train, prompt, entail – Eneko Agirre 78
(Brown et al. 2020)
LM prompting (few-shot)

No parameter update

● Good results with the largest GPT-3 models (175B)

● Even if there is no parameter update

● Large variance depending on prompts (templates and label map)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre Zero-shot and few-shot
79

Few-shot learning with prompts and parameter updates

Traditional fine-tuning

Training example: (The sky is fantastic,Positive)

Positive = 82% Negative = 18%

Fine-tuned

LM

[CLS] The sky is fantastic . [SEP]

Few-shot learning with prompts and parameter updates

Traditional fine-tuning

● Low results on few-shot setting

Few-shot IE: pre-train, prompt, entail – Eneko Agirre 82

Few-shot learning with prompts and parameter updates

Fine-tune LM using prompted datasets

Usually smaller LM (e.g. PET)

Training example – input and label: (The sky is fantastic, Positive)

Prompted training example – input and label: (The sky is fantastic. It was [MASK] !, great)

Exploiting Cloze Questions for Few Shot Text Classification and NLI (Schick and Schutze, 2020)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre 83

Few-shot learning with prompts and parameter updates

Fine-tune LM using prompted datasets

Usually smaller LM (e.g. PET)

great = 12%

terrible = 4%

Fine-tuned

LANGUAGE MODEL

LM

The sky is fantastic . It was [MASK] !

Exploiting Cloze Questions for Few Shot Text Classification and NLI (Schick and Schutze, 2020)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre 84

Few-shot learning with prompts

and parameter

updates

PET outperforms GPT-3 with 1000x less parameters

Ensembling Iterations

Few-shot IE: pre-train, prompt, entail – Eneko Agirre 85
Exploiting Cloze Questions for Few Shot Text Classification and NLI (Schick and Schutze, 2020)

Few-shot learning with prompts and parameter updates

T-Few outperforms

GPT-3 on held-out

T0 tasks

80 times less parameters

Chart shows efficiency at inference

Few-shot IE: pre-train, prompt, entail – Eneko Agirre 86
Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning (Liu et
2022)
Few-Shot
al.

Conclusions on prompting

● Size of models and update of parameters

Larger LMs, no update: best zero-shot, strong few-shot

Smaller LMs, update: best few-shot

● Inference ability of LM is limited: –

Poor results on entailment datasets (Brown et al. 2021) –

BIG-BENCH: model performance and calibration both improve with scale, but are poor in absolute terms (Srivastava et al. 2022)

No wonder, they are capped by the phenomena needed to predict masked words, so no need to learn anything else

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
87

Conclusions on prompting

● Size of models and update of parameters

Larger LMs, no update: best zero-shot, strong few-shot

Smaller LMs, update: best few-shot

● Inference ability of LM is limited: –

Poor results on entailment datasets –

BIG-BENCH: model performance and calibration both improve with scale, but are poor in absolute terms (Srivastava et al. 2022)

No wonder, LMs are capped by the phenomena needed to predict masked words, so no need to learn anything else

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
88

Conclusions on prompting

Improving inference ability is an open problem:

● Chain-of-thought (fine-tuning)

● Prompted datasets - instructions (fine-tuning)

● Reinforcement learning with human feedback

● Combine LMs with reasoners and tools

Our proposal: teach inference ability via labeled entailment datasets

PaLM: Scaling Language Modeling with Pathways (Chowderhy et al. 2022)

Benchmarking Generalization via In-Context Instructions on 1,600+ Language Tasks (Wang et al. 2022)

Training language models to follow instructions with human feedback (Ouyang et al. 2022)

Augmented Language Models: a Survey (Mialong et al. 2023)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
89

Plan for this session

● Pre-trained LM

● Prompting

● Entailment

● Few-shot Information Extraction

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
92

Textual Entailment (RTE),

Natural Language Inference (NLI)

Dagan et al. 2005 (refined Manning et al. 2006)

● We say that Text entails Hypothesis if, typically, a human reading Text would infer that Hypothesis is most likely true.

Text (Premise): I’m not sure what the overnight low was Hypothesis: I don't know how cold it got last night. {entailment, contradiction, neutral}

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
93
tutorial

Textual Entailment (RTE),

Natural Language Inference (NLI)

Dagan et al. 2005 (refined Manning et al. 2006)

● We say that Text entails Hypothesis if, typically, a human reading Text would infer that Hypothesis is most likely true.

Text (Premise): I’m not sure what the overnight low was Hypothesis: I don't know how cold it got last night.

{entailment, contradiction, neutral}

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
94
tutorial
Bowman and Zhu, NAACL 2019

Textual Entailment (RTE),

Natural Language Inference (NLI)

Dagan et al. 2005 (refined Manning et al. 2006)

● We say that Text entails Hypothesis if, typically, a human reading Text would infer that Hypothesis is most likely true.

Text (Premise): I’m not sure what the overnight low was Hypothesis: I don't know how cold it got last night. {entailment, contradiction, neutral}

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
95
tutorial
Bowman and Zhu, NAACL 2019

Textual Entailment (RTE),

Natural Language Inference (NLI)

NLI datasets widely used to measure quality of models.

To perform well, models need to tackle several linguistic phenomena:

Lexical entailment (cat vs. animal, cat vs. dog)

Quantification (all, most, fewer than eight) –

Lexical ambiguity and scope ambiguity (bank, ...) –

Modality (might, should, ...) –

Common sense background knowledge

–...

Compositional interpretation without grounding.

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
96

Textual Entailment (RTE),

Natural Language Inference (NLI)

NLI datasets widely used to measure quality of models.

To perform well, models need to tackle several linguistic phenomena:

Lexical entailment (cat vs. animal, cat vs. dog)

Quantification (all, most, fewer than eight)

Lexical ambiguity and scope ambiguity (bank, ...) –

Modality (might, should, ...) –

Common sense background knowledge

–...

Compositional interpretation without grounding.

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
97

Textual Entailment (RTE),

Natural Language Inference (NLI)

Common tasks can be cast as entailment premise-hypothesis pairs:

● Information Extraction: Given a text (premise), check whether it entails a relation (hypothesis)

● Question Answering: given a question (premise) identify a text that entails an answer (hypothesis)

● Information Retrieval: Given a query (hypothesis) identify texts that entail the query (premise)

● Summarization ...

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
98

Textual Entailment (RTE), Natural Language Inference (NLI)

Datasets:

● RTE 1-7 (Dagan et al. 2006-2012)

Premises (texts) drawn from naturally occurring text.

Expert-constructed hypotheses. 5000 examples.

● SNLI, MultiNLI (Bowman et al. 2015; Williams et al. 2017)

Crowdsourcers provided hypothesis for captions.

MultiNLI extended to other genres.

1 million examples.

– Biases in hypotheses (Gururangan et al., 2018; Poliak et al., 2018)

– Data generation with naıve annotators (Geva et al. 2019), artefacts

● FEVER-NLI (Nie et al. 2019)

Fact verification dataset.

● ANLI: (Nie et al. 2012)

Manually created adversarial examples.

200,000 examples.

168,000 examples.

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
100

Textual Entailment (RTE), Natural Language Inference (NLI)

Finetune MLM on NLI

Entailment = 72% Contradiction = 12% Neutral = 16%

(Devlin et al.

2019)

[CLS]
Premise [SEP] Hypothesis [SEP]

Textual Entailment (RTE), Natural Language Inference (NLI)

GPT-3 using prompts

Premise

Hypothesis

Language Models are Few-Shot Learners (Brown et al. 2020)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
116

Textual Entailment (RTE), Natural Language Inference (NLI)

GPT-3 using prompts

Language Models are Few-Shot Learners (Brown et al. 2020)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
117
Label

Textual Entailment (RTE),

Natural Language Inference (NLI)

GPT-3 using prompts fails

“These results on both RTE and ANLI suggest Also confirmed for PaLM 540B (Chowdhery et al. 2022)

Language Models are Few-Shot Learners (Brown et al. 2020)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
124

Textual Entailment (RTE),

Natural Language Inference (NLI)

GPT3 using prompts fails

“These results on both RTE and ANLI suggest that NLI is still a very difficult task for language models”

Language Models are Few-Shot Learners (Brown et al. 2020)

Also confirmed for InstructGPT3 and PaLM 540B

● Results of PaLM only improved when fine-tuning on NLI data

PaLM: Scaling Language Modeling with Pathways (Chowderhy et al. 2022)

Few-shot IE:
Agirre
pre-train, prompt, entail – Eneko
125

Textual Entailment (RTE),

Natural Language Inference (NLI)

GPT-3 using prompts fails

Diagnostic NLI dataset:

“These results on both RTE and ANLI suggest

Language Models are Few-Shot Learners (Brown et al. 2020)

(Wang et al., 2019) Also used at SuperGlue leaderboard

128

Textual Entailment (RTE),

Natural Language Inference (NLI)

GPT-3 using prompts fails

Diagnostic NLI dataset:

Double Negation: 0.0

Morphological Negation: 0.0

Anaphora/Coreference: 1.7

Nominalization: 2.6

Downward Monotone: 3.6

Conjuction: 4.0

Existential: 6.1

Quantifiers: 59.5

Restrictivity: 48.5

Intersectivity: 41.4

Universal: 39.6

“These results on both RTE and ANLI suggest that NLI is still a very difficult task for language models”

Active/Passive: 34.5

Knowledge: 32.0

Language Models are Few-Shot Learners (Brown et al. 2020)

World Knowledge: 33.0

Disjunction: 7.4

Logic: 10.6

Negation: 11.6

Temporal: 12.4

Factivity: 31.6

Also confirmed for PaLM 540B (Chowdhery et al. 2022)

Lexical Semantics: 30.0

Common Sense: 28.4

Matthew Correlation Score, from SuperGlue leaderboard

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
129

Overcoming limitations of LM

LMs fail on many inferences in NLI datasets

Our hypothesis:

Fine-tuning LMs on NLI datasets allow LMs to learn certain inferences ...

… which the LMs will apply on target tasks

Entailment as Few-Shot Learner (Wang et al. 2021)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
139

Plan for this session

● Pre-trained LM

● Prompting

● Entailment

● Few-shot Information

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
140
Extraction

Few-shot Information Extraction?

Our proposal:

● Use “smaller” language models

● Additional pre-training with NLI datasets => Entailment Models

● Recast IE tasks into text-hypothesis pairs

● Run entailment model (zero-shot)

● Fine-tune entailment model (few-shot, full train)

We will examine our work on:

● Relation extraction (Sainz et al 2021, EMNLP)

● Event-argument extraction (Sainz et al. 2022, NAACL findings)

● Several IE tasks (Sainz et al. 2022, NAACL demo)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
141

Few-shot Information Extraction?

Our proposal:

● Use “smaller” language models

● Additional pre-training with NLI datasets => Entailment Models

● Recast IE tasks into text-hypothesis pairs

● Run entailment model (zero-shot)

● Fine-tune entailment model (few-shot, full train)

We will present our work on:

● Relation extraction (Sainz et al 2021, EMNLP)

● Event-argument extraction (Sainz et al. 2022, NAACL findings)

● Several IE tasks (Sainz et al. 2022, NAACL demo)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
142

Entailment for prompt-based Relation Extraction (Sainz et al 2021, EMNLP)

Given 2 entities e1 and e2 and a context c, predict the schema relation (if any) holding between the two entities in the context. Billy Mays

Billy Mays, the bearded, boisterous pitchman who, as the undisputed king of TV yell and sell, became an unlikely pop culture icon, died at his home in Tampa, Fla, on Sunday.

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
PERSON, TampaCITY per:city_of_death
147

Entailment for prompt-based Relation Extraction

Given 2 entities e1 and e2 and a context c, predict the schema relation (if any) holding between the two entities in the context.

Billy Mays, the bearded, boisterous pitchman who, as the undisputed king of TV yell and sell, became an unlikely pop culture icon, died at his home in Tampa, Fla, on Sunday.

Verbalizer

e1 died in e2

template per:city_of_death

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
Billy MaysPERSON, TampaCITY
149

Entailment for prompt-based Relation Extraction

Given 2 entities e1 and e2 and a context c, predict the schema relation (if any) holding between the two entities in the context.

text:

Billy Mays, the bearded, boisterous pitchman who, as the undisputed king of TV yell and sell, became an unlikely pop culture icon, died at his home in Tampa, Fla, on Sunday.

Verbalizer

hypothesis:

e1 died in e2

template per:city_of_death

Billy Mays died in Tampa.

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
Billy MaysPERSON, TampaCITY
150

Entailment for prompt-based Relation Extraction

Given 2 entities e1 and e2 and a context c, predict the schema relation (if any) holding between the two entities in the context.

text:

Billy

Verbalizer

e1 died in e2

template per:city_of_death

Billy Mays, the bearded, boisterous pitchman who, as the undisputed king of TV yell and sell, became an unlikely pop culture icon, died at his home in Tampa, Fla, on Sunday.

hypothesis:

Billy Mays died in Tampa.

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
CITY
MaysPERSON, Tampa
151

Entailment for prompt-based Relation Extraction

Given 2 entities e1 and e2 and a context c, predict the schema relation (if any) holding between the two entities in the context.

text:

Billy Mays

Billy Mays, the bearded, boisterous pitchman who, as the undisputed king of TV yell and sell, became an unlikely pop culture icon, died at his home in Tampa, Fla, on Sunday.

Run entailment model

Verbalizer

hypothesis:

e1 died in e2

template per:city_of_death

Billy Mays died in Tampa.

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
CITY
PERSON, Tampa
152
C
N C
C
E N
E
E N

Entailment for prompt-based Relation Extraction

Given 2 entities e1 and e2 and a context c, predict the schema relation (if any) holding between the two entities in the context.

text:

Billy Mays

Billy Mays, the bearded, boisterous pitchman who, as the undisputed king of TV yell and sell, became an unlikely pop culture icon, died at his home in Tampa, Fla, on Sunday.

Verbalizer

hypothesis:

e1 died in e2

template per:city_of_death

Billy Mays died in Tampa.

per:city_of_death

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
CITY
PERSON, Tampa
153

Entailment for prompt-based Relation Extraction

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
154

Entailment for prompt-based Relation Extraction

Verbalizer

Billy MaysPERSON, TampaCITY

Hypothesis:

Billy Mays was born in Tampa.

Billy Mays’s birthday is on Tampa.

Billy Mays is Tampa years old.

. . .

Billy Mays died in Tampa.

Few-shot IE:
– Eneko Agirre
pre-train, prompt, entail
155

Entailment for prompt-based Relation Extraction

Next, we compute the entailment probabilities for each of the hypothesis independently.

NLI Model

Billy Mays, the bearded, boisterous pitchman who, as the undisputed king of TV yell and sell, became an unlikely pop culture icon, died at his home in Tampa, Fla, on Sunday.

E N C

Billy Mays’s birthday is on Tampa. Billy Mays is Tampa years old. Billy Mays died in Tampa.

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
E N C E N C
157

Entailment for prompt-based Relation Extraction

● We compute the probability of relation r based on the hypothesis probabilities and entity constraints:

● The δr function describes the entity constraints of the relation r:

Relation probability inference

Few-shot IE: pre-train, prompt, entail – Eneko Agirre Pr (x, Billy Mays, Tampa) = δr(PERSON, CITY) max( ) ∈ Tr
158

for prompt-based Relation Extraction

Verbalizer

Billy MaysPERSON, TampaCITY

NLI Model

Billy Mays, the bearded, boisterous pitchman who, as the undisputed king of TV yell and sell, became an unlikely pop culture icon, died at his home in Tampa, Fla, on Sunday.

Hypothesis:

Billy Mays was born in Tampa

Billy Mays’s birthday is on Tampa.

Billy Mays is Tampa years old.

. . .

Billy Mays died in Tampa.

Billy Mays’s birthday is on Tampa Billy Mays is Tampa years old. Billy Mays died in Tampa

Relation probability inference

Few-shot IE: pre-train, prompt, entail – Eneko Agirre Pr (x, Billy Mays, Tampa) = δr(PERSON, CITY) max( ) ∈ Tr
159 E N C E N C
Entailment
E N C

Entailment for prompt-based Relation Extraction

Hypothesis:

Billy Mays, the bearded, boisterous pitchman who, as the undisputed king of TV yell and sell, became an unlikely pop culture icon, died at his home in Tampa, Fla, on Sunday.

Billy Mays was born in Tampa.

Billy Mays’s birthday is on Tampa

Billy Mays is Tampa years old.

.

Billy Mays died in Tampa

Finally, we return the relation with the highest probability:

If none of the relations is entailed, then r = no_relation

pre-train, prompt, entail

Few-shot IE:
– Eneko Agirre E N E N
Billy MaysPERSON, TampaCITY Verbalizer
E N C NLI
Pr (x, Billy Mays, Tampa) = δr(PERSON, CITY) max( ) ∈ Tr
probability inference
Billy Mays died in Tampa
Model
Relation
160

Entailment for prompt-based Relation Extraction

Hypothesis:

Billy Mays, the bearded, boisterous pitchman who, as the undisputed king of TV yell and sell, became an unlikely pop culture icon, died at his home in Tampa, Fla, on Sunday.

Billy Mays was born in Tampa.

Billy Mays’s birthday is on Tampa

Billy Mays is Tampa years old.

.

Billy Mays died in Tampa

Finally, we return the relation with the highest probability:

If none of the relations is entailed, then r = no_relation

pre-train, prompt, entail

Few-shot IE:
Agirre E N E N
Eneko
Billy MaysPERSON, TampaCITY Verbalizer
E N C NLI
Pr (x, Billy Mays, Tampa) = δr(PERSON, CITY) max( ) ∈ Tr
probability inference
Billy Mays died in Tampa
Model
Relation
161 ZERO-SHOT

Fine-tuning with prompted Relation Extraction dataset

Examples with relation:

per:city_of_death

Context

Billy MaysPERSON, TampaCITY

Examples with no relation:

no_relation

Context

BaldwinPERSON, executiveTITLE

Context. [SEP] Billy Mays died in Tampa.

Context. [SEP] Billy Mays was born in Tampa.

Context. [SEP] Baldwin is a executive.

Few-shot
Eneko Agirre
IE: pre-train, prompt, entail –
162

Fine-tuning with prompted Relation Extraction dataset

Examples with relation:

per:city_of_death

Context

Billy MaysPERSON, TampaCITY

Examples with no relation:

no_relation

Context

BaldwinPERSON, executiveTITLE

Fine-tune MLM with prompted examples

Context. [SEP] Billy Mays died in Tampa.

Context. [SEP] Billy Mays was born in Tampa.

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
163

Fine-tuning with prompted Relation Extraction dataset

Examples with relation:

per:city_of_death

Context

Billy MaysPERSON, TampaCITY

Examples with no relation:

no_relation

Context

BaldwinPERSON, executiveTITLE

Fine-tune MLM with prompted examples

Context. [SEP] Billy Mays died in Tampa.

Context. [SEP] Billy Mays was born in Tampa.

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
164
FEW-SHOT

Evaluation dataset

TACRED (Zhang et al., 2017), based on TAC

41 relation labels (positive), no relation (negative).

Training:

● Zero-shot: 0 examples

● Few-shot: –

5 examples per class (1%) –

23 examples per class (5%) –

46 examples per class (10%)

● Full-train: 460 examples per class

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
165

Evaluation: zero-shot

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
167

Evaluation: zero-shot

Zero-Shot relation extraction:

● Best results with DeBERTa

● Note that minor variations in MNLI (±2) produce large variations in F1.

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
168

Evaluation: few-shot

Few-Shot relation extraction:

● State of the art systems have difficulties to learn the task

● Smaller than our zero-shot system (F1 57)

● Our systems large improvements over SOTA systems. 1% > 10%

● DeBERTa model score the best.

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
170

Evaluation: few-shot

Few-Shot relation extraction:

● State of the art systems have difficulties to learn the task where very small amount of data is annotated.

● Our systems large improvements over SOTA systems. 1% > 10%

● DeBERTa models score the best.

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
171

Entailment for prompt-based Event Argument Extraction (Sainz et al. 2022, NAACL)

Given the success on Relation

Extraction, we extended the work:

● Check Event Argument Extraction

● Transfer knowledge across event schemas

● Measure effect of different NLI datasets

● Measure domain-expert hours

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
172

Entailment for prompt-based Event Argument Extraction

Given event e and argument candidate a and a context c, predict the argument relation (if any) holding between the event and candidate in the context.

Few-shot IE: pre-train, prompt, entail – Eneko Agirre 173
hiredSTART-POSITION, John D. IdolPERSON Start-Position:Person In 1997,
D. Idol to take over as
executive.
the company hired John
chief

Entailment for prompt-based Event Argument Extraction

Given event e and argument candidate a and a context c, predict the argument relation (if any) holding between the event and candidate in the context.

text:

hired

In 1997, the company hired John D. Idol to take over as chief executive.

Verbalizer

a was hired template start-position:person

Few-shot IE: pre-train, prompt, entail – Eneko Agirre 174
START-POSITION
PERSON
, John D. Idol

Entailment for prompt-based Event Argument Extraction

Given event e and argument candidate a and a context c, predict the argument relation (if any) holding between the event and candidate in the context.

text:

hired

In 1997, the company hired John D. Idol to take over as chief executive.

Verbalizer

hypothesis: a was hired

John D. Idol was hired.

template start-position:person

Few-shot IE: pre-train, prompt, entail – Eneko Agirre 175
START-POSITION,
PERSON
John D. Idol

Entailment for prompt-based Event Argument Extraction

Given event e and argument candidate a and a context c, predict the argument relation (if any) holding between the event and candidate in the context.

text:

hired

, John D. Idol

In 1997, the company hired John D. Idol to take over as chief executive.

Verbalizer

hypothesis: a was hired

John D. Idol was hired.

template start-position:person

Few-shot IE: pre-train, prompt, entail – Eneko Agirre 176
START-POSITION
PERSON

Entailment for prompt-based Event Argument Extraction

Given event e and argument candidate a and a context c, predict the argument relation (if any) holding between the event and candidate in the context.

text:

hired

, John D. Idol

In 1997, the company hired John D. Idol to take over as chief executive.

Verbalizer

hypothesis: a was hired

John D. Idol was hired.

template start-position:person

Few-shot IE: pre-train, prompt, entail – Eneko Agirre 177
START-POSITION
PERSON
E N C E N C E N C
Run entailment model

Entailment for prompt-based Event Argument Extraction

Given event e and argument candidate a and a context c, predict the argument relation (if any) holding between the event and candidate in the context.

text:

hired

, John D. Idol

In 1997, the company hired John D. Idol to take over as chief executive.

Verbalizer

hypothesis: a was hired

John D. Idol was hired.

template start-position:person

start-position:person

Few-shot IE: pre-train, prompt, entail – Eneko Agirre 178
START-POSITION
PERSON

Entailment for prompt-based Event Argument Extraction

Given event e and argument candidate a and a context c, predict the argument relation (if any) holding between the event and candidate in the context.

Buyer

John D. Idol bought something.

Person

John D. Idol was hired.

Entity

John D. Idol hired someone.

Vehicle

John D. Idol was used as a vehicle.

Defendant

John D. Idol was the defendant.

Few-shot IE: pre-train, prompt, entail – Eneko Agirre 179
START-POSITION,
D. IdolPERSON
hired
John

Evaluation datasets

ACE (Walker et al., 2006). 22 arg. types.

WikiEvents (Li et al., 2021). 59 arg. types.

Training ( ACE / Wikievents):

● Zero-shot: 0 examples

● Few-shot: 11 / 4 examples per class (5%)

● Full-train: 220 / 80 examples per class (100%)

Few-shot IE: pre-train, prompt, entail
Eneko Agirre
187

Evaluation: ACE and Wikievents

● EM is a fine-tuned RoBERTa (strong baseline)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
188

Evaluation: ACE and Wikievents

● NLI is our entailment-based system (RoBERTa)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
189

Transfer between schemas works!

● NLI+: pre-train also on examples from other schema (Wikievents or ACE respectively)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
190

Transfer between schemas works!

● NLI+: pre-train also on examples from other schema (Wikievents or ACE respectively)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
191

State of the art

● We beat SOTA with NLI. Further improvement with NLI+

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
192

State of the art

● We beat SOTA with NLI. Further improvement with NLI+

● NLI+ matches full-train SOTA with only 5% of the annotations

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
193

State of the art

● We beat SOTA with NLI. Further improvement with NLI+

● NLI+ matches full-train SOTA with only 5% of the annotations

20 x reduction of annotated examples with respect to SOTA

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
194

The more NLI pre-training the better

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
196

The more NLI pre-training the better

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
197

The more NLI pre-training the better

Combining

NLI training datasets helps (also in TACRED)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
198

Is this because of a brilliant domain-expert?

● We gave the task to a computational linguist PhD

● Very similar results across all training regimes

● Replicable, robust to variations in prompts

● She also found writing prompts very friendly: “Writing templates is more natural and rewarding than annotating examples, which is more repetitive, stressful and tiresome.”

“When writing templates, I was thinking in an abstract manner, trying to find generalizations. When doing annotation I was paying attention to concrete cases.”

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
199

Is this because of a brilliant domain-expert?

● We gave the task to a computational linguist PhD

● Very similar results across all training regimes

● Replicable, robust to variations in prompts

● She also found writing prompts very friendly: “Writing templates is more natural and rewarding than annotating examples, which is more repetitive, stressful and tiresome.”

“When writing templates, I was thinking in an abstract manner, trying to find generalizations. When doing annotation I was paying attention to concrete cases.”

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
200

What is the manual cost of prompts compared to annotation

● Time devoted by domain-expert in template writing:

● Max. 15 minutes per argument

● ACE: 5 hours for 22 argument types

● WikiEvents: 12 hours for 59 argument types

● Estimate of time by domain-expert for annotation:

● ACE: 180 hours for whole dataset (16,500 examples)

● Severe under-estimation: no quality control, no team, speedy annotation requested

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
201

What is the manual cost of prompts compared to annotation

Two frameworks, 9 hours of domain-expert effort (ACE):

1) Define, annotate, train: annotate 850 ex. (9h, 5%)

2) Verbalize: prompts (5h), annotate 350 ex. (4h, 2%)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
202

What is the manual cost of prompts compared to annotation

Two frameworks, 9 hours of domain-expert effort (ACE):

1) Define, annotate, train: annotate 850 ex. (9h, 5%)

2) Verbalize: prompts (5h), annotate 350 ex. (4h, 2%)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
203 9

What is the manual cost of prompts compared to annotation

Two frameworks, 23 hours of domain-expert effort (ACE):

1) Define, annotate, train: annotate (23h, 13%)

2) Verbalize: prompts (5h), annotate (18h, 10%)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
204

What is the manual cost of prompts compared to annotation

Two frameworks, 23 hours of domain-expert effort (ACE):

1) Define, annotate, train: annotate (23h, 13%)

2) Verbalize: prompts (5h), annotate (18h, 10%)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
205 23

What is the manual cost of prompts compared to annotation

Two frameworks, 23 hours of domain-expert effort (ACE):

1) Define, annotate, train: annotate 13%

With 23 hours (10% train), our entailment model matches a fine-tuned model costing at least 180 hours (full-train)

2) Verbalize while defining: prompts (5h), ann. 10% (18h)

Same amount of parameters

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
206

Conclusions for prompt-based extraction using entailment

● Very effective for zero- and few-shot IE

● Allows for transfer across schemas (for the first time)

● At least 8 x less effort for domain expert

● It is now feasible to build an IE system from scratch with limited effort –

Develop schema and verbalization at the same time –

Verbalize then annotate a few examples

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
210

Verbalize while defining, interactive workflow (Sainz et al. 2022, NAACL demo)

1) Domain expert defines entities and relations in English

2) Runs the definitions on examples

3) Annotates a handful of incorrect examples

4) Iterate!

● User interface for NERC, RE, EE, EAE

● 2 minute video

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
217

Verbalize while defining, interactive workflow (Sainz et al. 2022, NAACL demo)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
219

Verbalize while defining, interactive workflow (Sainz et al. 2022, NAACL demo)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
220

Verbalize while defining, interactive workflow (Sainz et al. 2022, NAACL demo)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
221

Verbalize while defining, interactive workflow (Sainz et al. 2022, NAACL demo)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
222

Verbalize while defining, interactive workflow (Sainz et al. 2022, NAACL demo)

223

Verbalize while defining, interactive workflow

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
224

Verbalize while defining, interactive workflow (Sainz et al. 2022, NAACL demo)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
226

Verbalize while defining, interactive workflow (Sainz et al. 2022, NAACL demo)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
227

Verbalize while defining, interactive workflow (Sainz et al. 2022, NAACL demo)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
228

Verbalize while defining, interactive workflow (Sainz et al. 2022, NAACL demo)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
229

Verbalize while defining, interactive workflow (Sainz et al. 2022, NAACL demo)

230

Plan for this session

● Pre-trained LM

● Prompting

● Entailment

● Few-shot Information Extraction

● Conclusions

Few-shot IE: pre-train, prompt, entail – Eneko Agirre
231

Conclusions

● Pre-train, prompt and entail works

– Using “smaller” LMs

● Few-shot Information Extraction is here

● Verbalize while defining, interactive workflow

– Domain expert defines entities and relations in English

– Runs the definitions on examples

– Annotates a handful of incorrect examples, iterates

● Lower cost for building IE applications

● Friendlier to domain-experts

● Slides in my website, code at:

https://github.com/osainz59/Ask2Transformers

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

232

Conclusions

● Pre-train, prompt and entail works

– Using “smaller” LMs

● Few-shot Information Extraction is here

● Verbalize while defining, interactive workflow

– Domain expert defines entities and relations in English

– Runs the definitions on examples

– Annotates a handful of incorrect examples, iterates

● Lower cost for building IE applications

● Friendlier to domain-experts

● Slides in my website, code at:

https://github.com/osainz59/Ask2Transformers

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

233

Conclusions

● Pre-train, prompt and entail works

– Using “smaller” LMs

● Few-shot Information Extraction is here

● Verbalize while defining, interactive workflow

– Domain expert defines entities and relations in English

– Runs the definitions on examples

– Annotates a handful of incorrect examples, iterates

● Lower cost for building IE applications

● Friendlier to domain-experts

● Slides in my website, code at:

https://github.com/osainz59/Ask2Transformers

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

234

Ongoing work

● Verbalize while defining, interactive workflow

– Check real use-cases (e.g analysts BETTER program)

● Pre-train, prompt and entail works

– Check tasks beyond IE –

Compare head-to-head to plain LM (PET) and QA

● Beyond, DL – reasoning research

– Identify useful inferences to extend NLI datasets

Entailment as a method to teach inference to LM

Few-shot
IE: pre-train, prompt, entail – Eneko Agirre
235

Few-shot Information Extraction

Pre-train, Prompt, Entail

Basque Center for Language Technology (UPV/EHU)

@eagirre

https://hitz.eus/eneko/

https://github.com/osainz59/Ask2Transformers

Relation extraction (Sainz et al 2021, EMNLP)

Event-argument extraction (Sainz et al. 2022, NAACL findings)

Several IE tasks (Sainz et al. 2022, NAACL demo)

Few-shot IE: pre-train,
Eneko Agirre
prompt, entail –
Eneko Agirre Director of HiTZ
THANKS!
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.