Few-shot Information Extraction: Pre-train, Prompt, Entail by instituteforexperientialai

EAI - Northeastern University

Information Extraction Pre-train, Prompt, Entail Eneko Agirre Director of HiTZ Basque Center for Language Technology (UPV/EHU) @eagirre https://hitz.eus/eneko/

Few-shot

In collaboration with

IE: pre-train, prompt, entail – Eneko Agirre

Few-shot

In collaboration with Wearehiring!

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

Few-shot Information Extraction?

● Adoption of NLP in companies deterred because of high effort of domain experts

– In the case of Information Extraction, define non-trivial schemas with entities and relations of interest, annotate corpus, train supervised ML system –

Define, annotate, train

● Verbalize while defining, interactive workflow

–

Domain expert defines entities and relations in English

–

Runs the definitions on examples

–

Annotates a handful of incorrect examples, iterates

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

Few-shot Information Extraction?

● Adoption of NLP in companies deterred because of high effort of domain experts

– In the case of Information Extraction, define non-trivial schemas with entities and relations of interest, annotate corpus, train supervised ML system

–

Define, annotate, train

● Verbalize while defining, interactive workflow

PERSON: Each distinct person or set of people mentioned in a doc.

– Domain expert defines entities and relations in English

Named-entity

Classification (NEC)

–

Runs the definitions on examples

https://www.ldc.upenn.edu/sites/www.ldc.upenn.edu/files/english-entities-guidelines-v6.6.pdf

–

https://www.ldc.upenn.edu/sites/www.ldc.upenn.edu/files/english-events-guidelines-v5.4.3.pdf

Annotates a handful of incorrect examples, iterates

https://www.ldc.upenn.edu/sites/www.ldc.upenn.edu/files/english-relations-guidelines-v6.2.pdf

Few-shot IE:

Agirre

pre-train, prompt, entail – Eneko

GPE:

DATE:

ORG: ...

...

... NEC

Few-shot Information Extraction?

● Adoption of NLP in companies deterred because of high effort of domain experts

– In the case of Information Extraction, define non-trivial schemas with entities and relations of interest, annotate corpus, train supervised ML system

– Define, annotate, train

● Verbalize while defining, interactive workflow

– Domain expert defines entities and relations in English

Named-entity

Classification (NEC) Event Extraction (EE)

–

Runs the definitions on examples

https://www.ldc.upenn.edu/sites/www.ldc.upenn.edu/files/english-entities-guidelines-v6.6.pdf

–

https://www.ldc.upenn.edu/sites/www.ldc.upenn.edu/files/english-events-guidelines-v5.4.3.pdf

Annotates a handful of incorrect examples, iterates

https://www.ldc.upenn.edu/sites/www.ldc.upenn.edu/files/english-relations-guidelines-v6.2.pdf

Few-shot IE:

Eneko Agirre

pre-train, prompt, entail –

10 EVENT LIFE.DIE: A DIE Event occurs whenever the life of a PERSON Entity ends.

ERSON: Each distinct person or set of people mentioned in a doc. ORG: ... GPE: ... DATE: ... NEC

Few-shot Information Extraction?

●

– Define, annotate, train

P NEC

Each distinct person or set of people mentioned in a doc.

● Verbalize while defining, interactive workflow

– Domain expert defines entities and relations in English

Named-entity

– Runs the definitions on examples

https://www.ldc.upenn.edu/sites/www.ldc.upenn.edu/files/english-entities-guidelines-v6.6.pdf

https://www.ldc.upenn.edu/sites/www.ldc.upenn.edu/files/english-events-guidelines-v5.4.3.pdf

– Annotates a handful of incorrect examples, iterates

https://www.ldc.upenn.edu/sites/www.ldc.upenn.edu/files/english-relations-guidelines-v6.2.pdf

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

companies deterred

effort of domain experts

Adoption of NLP in

because of high

the case of Information Extraction, define non-trivial

with

and

supervised ML system

– In

schemas

entities

relations of interest, annotate corpus, train

11 RELATION EVENT LIFE.DIE: A DIE Event occurs whenever the life of a PERSON Entity ends. EMPLOYEEOF: Employment captures the relationship between Persons and their employers. This Relation is only taggable when it can be reasonably assumed that the PER is paid by the ORG or GPE.

ERSON:

Classification (NEC) Event Extraction (EE) Relation Extraction (RE) ORG: ... GPE: ... DATE: ...

●

https://www.ldc.upenn.edu/sites/www.ldc.upenn.edu/files/english-entities-guidelines-v6.6.pdf

https://www.ldc.upenn.edu/sites/www.ldc.upenn.edu/files/english-events-guidelines-v5.4.3.pdf

https://www.ldc.upenn.edu/sites/www.ldc.upenn.edu/files/english-relations-guidelines-v6.2.pdf

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

Few-shot Information Extraction?

Adoption of NLP in companies deterred because of high effort of domain experts

In the case of Information Extraction, define non-trivial schemas with entities and relations of interest, annotate corpus, train supervised ML system

Define, annotate, train

Verbalize while defining, interactive workflow

defines entities and relations in English

the

12 RELATION EVENT DATE: ... LIFE.DIE: A DIE Event occurs whenever the life of a PERSON Entity ends. EMPLOYEEOF: Employment captures the relationship between Persons and their employers. This Relation is only taggable when it can be reasonably assumed that the PER is paid by the ORG or GPE. VICTIM-ARG: The person(s) who died EVENT ARGUMENT PLACE-ARG: Where the death takes place

–

●

– Domain expert

– Runs

definitions on examples – Annotates a handful of incorrect examples, iterates

Extraction

Relation Extraction

Event Argument Extraction

ERSON:

ORG: ... GPE: ... DATE: ... NEC

Named-entity Classification (NEC) Event

(EE)

(RE)

(EAE) P

Each distinct person or set of people mentioned in a doc.

Few-shot Information Extraction?

● Adoption of NLP in companies deterred because of high effort of domain experts

In the case of Information Extraction, define non-trivial schemas with entities and relations of interest, annotate corpus, train supervised ML system –

–

Define, annotate, train

● Verbalize while defining, interactive workflow

–

Domain expert defines entities and relations in English

–

Runs the definitions on examples

John Smith, an executive at XYZ Co., died in Florida on Sunday.

–

Annotates a handful of incorrect examples, iterates

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

Few-shot Information Extraction?

● Adoption of NLP in companies deterred because of high effort of domain experts

– In the case of Information Extraction, define non-trivial schemas with entities and relations of interest, annotate corpus, train supervised ML system –

Define, annotate, train

● Verbalize while defining, interactive workflow

– Domain expert defines entities and relations in English

–

Runs the definitions on examples

John SmithPERSON, an executive at XYZ Co.ORGANIZATION, died in FloridaGPE on SundayDATE.

–

Annotates a handful of incorrect examples, iterates

Few-shot IE: pre-train,

Eneko Agirre

prompt, entail –

doc. ORG: ... GPE: ... DATE: ... NEC

PERSON: Each distinct person or set of people mentioned in a

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

● Adoption of NLP in companies deterred because of high effort of domain experts

In the case of Information Extraction, define non-trivial schemas with entities and relations of interest, annotate corpus, train supervised ML system – Define, annotate, train

Verbalize while defining, interactive workflow

Domain expert defines entities and relations in English

the

15 EVENT PERSON: Each distinct person or set of people mentioned in a doc. ORG: ... GPE: ... DATE: ... LIFE.DIE: A DIE Event occurs whenever the life of a PERSON Entity ends. NEC John SmithPERSON, an executive at XYZ Co.ORGANIZATION, diedLIFE.DIE in FloridaGPE on SundayDATE.

Few-shot Information Extraction?

–

●

–

– Runs

definitions on examples – Annotates a handful of incorrect examples, iterates

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

Information Extraction? ● Adoption of NLP in companies deterred because of high effort of domain experts – In the case of Information Extraction, define non-trivial schemas with entities and relations of interest, annotate corpus, train supervised ML system – Define, annotate, train ● Verbalize while defining, interactive workflow – Domain expert defines entities and relations in English – Runs the definitions on examples – Annotates a handful of incorrect examples, iterates 16 RELATION EVENT PERSON: Each distinct person or set of people mentioned in a doc. ORG: ... GPE: ... DATE: ... LIFE.DIE: A DIE Event occurs whenever the life of a PERSON Entity ends. EMPLOYEEOF: Employment captures the relationship between Persons and their employers. This Relation is only taggable when it can be reasonably assumed that the PER is paid by the ORG or GPE. NEC John SmithPERSON, an executive at XYZ Co.ORGANIZATION, diedLIFE.DIE in FloridaGPE on SundayDATE. EMPLOYEEOF

Few-shot

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

Information Extraction? ● Adoption of NLP in companies deterred because of high effort of domain experts – In the case of Information Extraction, define non-trivial schemas with entities and relations of interest, annotate corpus, train supervised ML system – Define, annotate, train ● Verbalize while defining, interactive workflow – Domain expert defines entities and relations in English – Runs the definitions on examples – Annotates a handful of incorrect examples, iterates 17 RELATION EVENT PERSON: Each distinct person or set of people mentioned in a doc. ORG: ... GPE: ... DATE: ... LIFE.DIE: A DIE Event occurs whenever the life of a PERSON Entity ends. EMPLOYEEOF: Employment captures the relationship between Persons and their employers. This Relation is only taggable when it can be reasonably assumed that the PER is paid by the ORG or GPE. VICTIM-ARG: The person(s) who died EVENT ARGUMENT PLACE-ARG: Where the death takes place NEC John SmithPERSON, an executive at XYZ Co.ORGANIZATION, diedLIFE.DIE in FloridaGPE on SundayDATE. EMPLOYEEOF PLACE-ARG VICTIM-ARG TIME-ARG

Few-shot

Few-shot Information Extraction?

● Adoption of NLP in companies deterred because of high effort of domain experts

– In the case of Information Extraction, define non-trivial schemas with entities and relations of interest, annotate corpus, train supervised ML system

–

Define, annotate, train

● Verbalize while defining, interactive workflow

–

Domain expert defines entities and relations in English

–

Runs the definitions on examples

–

Annotates a handful of incorrect examples, iterates

mastro-h2020.eu/project-committees/

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

Few-shot Information Extraction?

Interactive workflow: verbalize while defining – Domain expert defines entities and relations in English

– Runs the definitions on examples

–

Annotates a handful of incorrect examples, iterates

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

●

Few-shot Information Extraction?

Interactive workflow: verbalize while defining – Domain expert defines entities and relations in English

– Runs the definitions on examples

–

Annotates a handful of incorrect examples, iterates

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

●

21 {X} is a person. → PERSON {X} is a date. → DATE NEC VERBALIZATIONS

Few-shot Information Extraction?

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

● Interactive workflow: verbalize while defining – Domain expert defines entities and relations in English

Runs the definitions on examples

a handful of incorrect examples, iterates 22 EVENT VERBALIZATIONS {X} is a person. → PERSON {X} is a date. → DATE {E} refers to a death. → LIFE.DIE NEC VERBALIZATIONS

–

– Annotates

Few-shot Information Extraction?

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

● Interactive workflow: verbalize while defining – Domain expert defines entities and relations in English – Runs the definitions on examples

a handful of incorrect examples, iterates 23 RELATION VERBALIZATIONS EVENT VERBALIZATIONS {X} is a person. → PERSON {X} is a date. → DATE {E} refers to a death. → LIFE.DIE {X} is employed by {Y} → EMPLOYEEOF NEC VERBALIZATIONS

– Annotates

Few-shot Information Extraction?

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

● Interactive workflow: verbalize while defining – Domain expert defines entities and relations in English – Runs the definitions on examples – Annotates a handful of incorrect examples, iterates 24 RELATION VERBALIZATIONS EVENT VERBALIZATIONS {X} is a person. → PERSON {X} is a date. → DATE {E} refers to a death. → LIFE.DIE {X} is employed by {Y} → EMPLOYEEOF {X} died. → VICTIM-ARG EVENT ARGUMENT VERBALIZATIONS Someone {E} in {X}. → PLACE-ARG NEC VERBALIZATIONS

Few-shot Information Extraction?

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

● Interactive workflow: verbalize while defining – Domain expert defines entities and relations in English – Runs the definitions on examples – Annotates a handful of incorrect examples, iterates 25 RELATION VERBALIZATIONS EVENT VERBALIZATIONS {X} is a person. → PERSON {X} is a date. → DATE {E} refers to a death. → LIFE.DIE {X} is employed by {Y} → EMPLOYEEOF {X} died. → VICTIM-ARG EVENT ARGUMENT VERBALIZATIONS Someone {E} in {X}. → PLACE-ARG NEC VERBALIZATIONS John Smith, an executive at XYZ Co., died in Florida on Sunday.

Few-shot

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

● Interactive workflow: verbalize while defining – Domain expert defines entities and relations in English – Runs the definitions on examples – Annotates a handful of incorrect examples, iterates 26 RELATION VERBALIZATIONS EVENT VERBALIZATIONS {X} is a person. → PERSON {X} is a date. → DATE {E} refers to a death. → LIFE.DIE {X} is employed by {Y} → EMPLOYEEOF {X} died. → VICTIM-ARG EVENT ARGUMENT VERBALIZATIONS Someone {E} in {X}. → PLACE-ARG NEC VERBALIZATIONS John SmithPERSON, an executive at XYZ Co.ORGANIZATION, diedLIFE.DIE in FloridaGPE on SundayDATE. EMPLOYEEOF PLACE-ARG VICTIM-ARG TIME-ARG

Information Extraction?

Few-shot

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

● Interactive workflow: verbalize while defining – Domain expert defines entities and relations in English – Runs the definitions on examples – Annotates a handful of incorrect examples, iterates 27 RELATION VERBALIZATIONS EVENT VERBALIZATIONS {X} is a person. → PERSON {X} is a date. → DATE {E} refers to a death. → LIFE.DIE {X} is employed by {Y} → EMPLOYEEOF {X} died. → VICTIM-ARG EVENT ARGUMENT VERBALIZATIONS Someone {E} in {X}. → PLACE-ARG NEC VERBALIZATIONS John SmithPERSON, an executive at XYZ Co.ORGANIZATION, diedLIFE.DIE in FloridaGPE on SundayDATE. EMPLOYEEOF PLACE-ARG VICTIM-ARG TIME-ARG

Information Extraction?

Few-shot

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

● Interactive workflow: verbalize while defining – Domain expert defines entities and relations in English – Runs the definitions on examples – Annotates a handful of incorrect examples, iterates 28 RELATION VERBALIZATIONS EVENT VERBALIZATIONS {X} is a person. → PERSON {X} is a date. → DATE {E} refers to a death. → LIFE.DIE {X} is employed by {Y} → EMPLOYEEOF {X} died. → VICTIM-ARG EVENT ARGUMENT VERBALIZATIONS Someone {E} in {X}. → PLACE-ARG NEC VERBALIZATIONS John SmithPERSON, an executive at XYZ Co.ORGANIZATION, diedLIFE.DIE in FloridaGPE on SundayDATE. EMPLOYEEOF PLACE-ARG VICTIM-ARG TIME-ARG

Information Extraction?

Few-shot Information Extraction?

Define, annotate, train vs. Interactive workflow: verbalize while defining

● 10 times more effective (time of domain experts)

● Friendlier for domain experts

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

freepik.com/ insider.com/

Few-shot Information Extraction?

Thanks to latest advances:

● Large pre-trained language models (LM)

● Recast IE into natural language instructions and prompts

But (even largest) LMs have limited inference ability

● Enhance inference abilities of LM with entailment datasets

● Recast IE as an entailment problem

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

Few-shot Information Extraction?

Thanks to latest advances:

● Large pre-trained language models (LM)

● Recast IE into natural language instructions and prompts

But (even largest) LMs have limited inference ability

● Enhance inference abilities of LM with entailment datasets

● Recast IE as an entailment problem

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

Plan for the talk

● Pre-trained Language Models

● Prompting

● Entailment

● Few-shot Information Extraction

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

Pre-trained Language Models

1) Self-supervised LM pre-training

–

Unlabelled data: HUGE corpora: Wikipedia, news, web crawl, social media, etc.

– Train some variant of a Language Model

2) Supervised pre-training

–

Very common in vision (ImageNet), standalone.

NLP in-conjuction with self-supervised LM, –

Task-specific: e.g. transfer from one Q&A dataset to another (REF) –

Entailment for improved inference (REF)

–

All available tasks (e.g. T0) (REF)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

Pre-trained Language Models

1) Self-supervised LM pre-training

–

Unlabelled data: HUGE corpora: Wikipedia, news, web crawl, social media, etc.

– Train some variant of a Language Model

2) Supervised pre-training

–

Very common in vision (ImageNet), standalone.

NLP in-conjuction with self-supervised LM. –

Task-specific: e.g. transfer from one Q&A dataset to another

–

Pivot task: e.g. entailment or Q&A

(e.g. Sainz et al. 2021; Wang et al. 2021)

–

All available tasks (e.g. T0, Sahn et al. 2021)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

Self-supervised LM pre-training

Informally, learn parameters ϴ using some variant of Pϴ(text | some other text)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

Self-supervised LM pre-training

Informally, learn parameters ϴ using some variant of Pϴ(text | some other text)

(Causal) Language Model (GPT) Masked Language Model (BERT)

Pre-Trained Models: Past, Present and Future (Han et al. 2021)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

Self-supervised LM pre-training

Informally, learn parameters ϴ using some variant of Pϴ(text | some other text)

(Causal) Language Model (GPT) Masked Language Model (BERT)

● Self-attention: left and right

● Loss: masked words

Pre-Trained Models: Past, Present and Future (Han et al. 2021)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

Self-supervised LM pre-training

blue = 20.60% red = 6.15% clear = 5.84% orange = 4.11% ...

● Self-attention: left and right

● Loss: masked words

● At inference it can fill explicitly masked tokens

clear

The sky is [MASK] . [SEP]

Pre-Trained Models: Past, Present and Future (Han et al. 2021)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre 49

[CLS]

Fine-tuning on a specific task

Sentence classification: Add a classification head on top of the [CLS] token

Sentiment Analysis

Training example: (The sky is fantastic,Positive)

Pre-Trained Models: Past, Present and Future (Han et al. 2021)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre 51

Fine-tuning on a specific task

Sentence classification: Add a classification head on top of the [CLS] token

Positive = 82%

Negative = 18%

Sentiment Analysis

[CLS] The sky is fantastic . [SEP]

Training example: (The sky is fantastic,Positive)

Pre-Trained Models: Past, Present and Future (Han et al. 2021)

Few-shot IE:

Eneko Agirre 52

pre-train, prompt, entail –

Why do Pre-trained LMs work so well?

● LM is a very difficult task, even for humans.

– LMs compress any possible context into a vector that generalizes over possible completions.

– Forced to learn syntax, semantics, encode facts about the world, etc.

● LM consume huge amounts of data

● The fine-tuning stage exploits the knowledge about language already in the LM, instead of starting from scratch

Few-shot IE: pre-train,

Eneko Agirre

prompt, entail –

Plan for this session

● Pre-trained LM

● Prompting

● Entailment

● Few-shot Information Extraction

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

What is prompt learning?

Rationale: Recast NLP tasks into natural language, so Pretrained Language Models can apply their knowledge about language and the world

Related ideas, zero-shot and few-shot

Learn a task with minimal task description: –

Instructions on what the task is –

Present task to LM as a prompt –

(few-shot) prepend handful of labeled examples

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

What is prompt learning?

Rationale: Recast NLP tasks into natural language, so Pretrained Language Models can apply their knowledge about language and the world

Related ideas: zero-shot and few-shot

Learn a task with minimal task description: –

Instructions on what the task is –

Present task to LM as a prompt –

(Few-shot) prepend handful of labeled examples

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

Sentiment analysis

Positive Negative

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

The sky is fantastic .

Sentiment analysis

Fine-tuned

Positive = 82%

Negative = 18%

[CLS] The sky is fantastic . [SEP]

LM prompting (zero-shot)

MODEL

The sky is fantastic .

Language Models are Few-Shot Learners (Brown et al. 2020)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

Frozen MLM LANGUAGE

LM prompting (zero-shot)

MODEL

The sky is fantastic . It was [MASK] !

Language Models are Few-Shot Learners (Brown et al. 2020)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

Frozen MLM LANGUAGE

LM prompting (zero-shot)

LANGUAGE MODEL

The sky is fantastic . It was [MASK] !

P1=P(great | The sky is fantastic. It was [MASK] !)

P2=P(terrible | The sky is fantastic. It was [MASK] !)

P1 > P2 then Positive

Language Models are Few-Shot Learners (Brown et al. 2020)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

Frozen MLM

LM prompting (zero-shot)

great = 12% terrible = 4%

LANGUAGE MODEL

The sky is fantastic . It was [MASK] !

P1=P(great | The sky is fantastic. It was [MASK] !)

P2=P(terrible | The sky is fantastic. It was [MASK] !)

P1 > P2 then Positive

Language Models are Few-Shot Learners (Brown et al. 2020)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

Frozen MLM

LM prompting (few-shot)

In-context learning

Training Data

Text: I’m not sure I like it.

Label: Negative

Text: Thank you for the amazing help.

Label: Positive

Language Models are Few-Shot Learners (Brown et al. 2020)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

LM prompting (few-shot)

In-context learning

Training Data

Text: I’m not sure I like it.

Label: Negative

Text: Thank you for the amazing help.

Label: Positive

S1 = I’m not sure I like it. It was terrible!

S2 = Thank you for the amazing help. It was great!

S = The sky is fantastic. It was ____

Language Models are Few-Shot Learners (Brown et al. 2020)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

LM prompting (few-shot)

In-context learning

Training Data

Text: I’m not sure I like it.

Label: Negative

Text: Thank you for the amazing help.

Label: Positive

LANGUAGE MODEL

S1 = I’m not sure I like it. It was terrible!

S2 = Thank you for the amazing help. It was great!

S = The sky is fantastic. It was ____

Language Models are Few-Shot Learners (Brown et al. 2020)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

LM prompting (few-shot)

In-context learning

Training Data

Text: I’m not sure I like it.

Label: Negative

Text: Thank you for the amazing help.

Label: Positive

LANGUAGE MODEL

S1 = I’m not sure I like it. It was terrible!

S2 = Thank you for the amazing help. It was great!

S = The sky is fantastic. It was ____

P1=P(great | S1 \n S2 \n The sky is fantastic. It was )

P2=P(terrible | S1 \n S2 \n The sky is fantastic. It was )

P1 > P2 then Positive

Language Models are Few-Shot Learners (Brown et al. 2020)

Few-shot IE: pre-train, prompt, entail

Eneko Agirre

–

LM prompting (few-shot)

In-context learning

Training Data

Text: I’m not sure I like it.

Label: Negative

Text: Thank you for the amazing help.

Label: Positive

LANGUAGE MODEL

S1 = I’m not sure I like it. It was terrible!

S2 = Thank you for the amazing help. It was great!

S = The sky is fantastic. It was ____

P1=P(great | S1 \n S2 \n The sky is fantastic. It was )

P2=P(terrible | S1 \n S2 \n The sky is fantastic. It was )

P1 > P2 then Positive

Language Models are Few-Shot Learners (Brown et al. 2020)

Few-shot

IE: pre-train, prompt, entail – Eneko Agirre

NOPARAMETER

UPDATE

Domain-experts provide templates / label map

Template: [x] It was __ !

Label map: great <=> positive

I’m not sure I like it. It was terrible!

Thank you for the amazing help. It was great!

The sky is fantastic. It was ____

Template: Review: [x] Sentiment: __

Label map: positive <=> positive

Review: I’m not sure I like it. Sentiment: negative

Review: Thank you for the amazing help. Sentiment: positive

Review: The sky is fantastic. Sentiment: ____

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

Domain-experts provide templates / label map

Template: [x] It was __ !

Label map: great <=> positive

I’m not sure I like it. It was terrible!

Thank you for the amazing help. It was great!

The sky is fantastic. It was ____

Template: Review: [x] Sentiment: __

Label map: positive <=> positive

Review: I’m not sure I like it. Sentiment: negative

Review: Thank you for the amazing help. Sentiment: positive

Review: The sky is fantastic. Sentiment: ____

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

In-context learning

Few-shot IE: pre-train, prompt, entail – Eneko Agirre 78

(Brown et al. 2020)

LM prompting (few-shot)

No parameter update

● Good results with the largest GPT-3 models (175B)

● Even if there is no parameter update

● Large variance depending on prompts (templates and label map)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre Zero-shot and few-shot

Few-shot learning with prompts and parameter updates

Traditional fine-tuning

Training example: (The sky is fantastic,Positive)

Positive = 82% Negative = 18%

Fine-tuned

[CLS] The sky is fantastic . [SEP]

Few-shot learning with prompts and parameter updates

Traditional fine-tuning

● Low results on few-shot setting

Few-shot IE: pre-train, prompt, entail – Eneko Agirre 82

Few-shot learning with prompts and parameter updates

Fine-tune LM using prompted datasets

Usually smaller LM (e.g. PET)

Training example – input and label: (The sky is fantastic, Positive)

Prompted training example – input and label: (The sky is fantastic. It was [MASK] !, great)

Exploiting Cloze Questions for Few Shot Text Classification and NLI (Schick and Schutze, 2020)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre 83

Few-shot learning with prompts and parameter updates

Fine-tune LM using prompted datasets

Usually smaller LM (e.g. PET)

great = 12%

terrible = 4%

Fine-tuned

LANGUAGE MODEL

The sky is fantastic . It was [MASK] !

Exploiting Cloze Questions for Few Shot Text Classification and NLI (Schick and Schutze, 2020)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre 84

Few-shot learning with prompts

and parameter

updates

PET outperforms GPT-3 with 1000x less parameters

Ensembling Iterations

Few-shot IE: pre-train, prompt, entail – Eneko Agirre 85

Exploiting Cloze Questions for Few Shot Text Classification and NLI (Schick and Schutze, 2020)

Few-shot learning with prompts and parameter updates

T-Few outperforms

GPT-3 on held-out

T0 tasks

80 times less parameters

Chart shows efficiency at inference

Few-shot IE: pre-train, prompt, entail – Eneko Agirre 86

Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning (Liu et

2022)

Few-Shot

al.

Conclusions on prompting

● Size of models and update of parameters

–

Larger LMs, no update: best zero-shot, strong few-shot

–

Smaller LMs, update: best few-shot

● Inference ability of LM is limited: –

Poor results on entailment datasets (Brown et al. 2021) –

BIG-BENCH: model performance and calibration both improve with scale, but are poor in absolute terms (Srivastava et al. 2022)

–

No wonder, they are capped by the phenomena needed to predict masked words, so no need to learn anything else

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

Conclusions on prompting

● Size of models and update of parameters

–

Larger LMs, no update: best zero-shot, strong few-shot

–

Smaller LMs, update: best few-shot

● Inference ability of LM is limited: –

Poor results on entailment datasets –

BIG-BENCH: model performance and calibration both improve with scale, but are poor in absolute terms (Srivastava et al. 2022)

–

No wonder, LMs are capped by the phenomena needed to predict masked words, so no need to learn anything else

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

Conclusions on prompting

Improving inference ability is an open problem:

● Chain-of-thought (fine-tuning)

● Prompted datasets - instructions (fine-tuning)

● Reinforcement learning with human feedback

● Combine LMs with reasoners and tools

Our proposal: teach inference ability via labeled entailment datasets

PaLM: Scaling Language Modeling with Pathways (Chowderhy et al. 2022)

Benchmarking Generalization via In-Context Instructions on 1,600+ Language Tasks (Wang et al. 2022)

Training language models to follow instructions with human feedback (Ouyang et al. 2022)

Augmented Language Models: a Survey (Mialong et al. 2023)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

Plan for this session

● Pre-trained LM

● Prompting

● Entailment

● Few-shot Information Extraction

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

Textual Entailment (RTE),

Natural Language Inference (NLI)

Dagan et al. 2005 (refined Manning et al. 2006)

● We say that Text entails Hypothesis if, typically, a human reading Text would infer that Hypothesis is most likely true.

Text (Premise): I’m not sure what the overnight low was Hypothesis: I don't know how cold it got last night. {entailment, contradiction, neutral}

Bowman and Zhu, NAACL 2019

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

tutorial

Textual Entailment (RTE),

Natural Language Inference (NLI)

Dagan et al. 2005 (refined Manning et al. 2006)

● We say that Text entails Hypothesis if, typically, a human reading Text would infer that Hypothesis is most likely true.

Text (Premise): I’m not sure what the overnight low was Hypothesis: I don't know how cold it got last night.

{entailment, contradiction, neutral}

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

tutorial

Bowman and Zhu, NAACL 2019

Textual Entailment (RTE),

Natural Language Inference (NLI)

Dagan et al. 2005 (refined Manning et al. 2006)

● We say that Text entails Hypothesis if, typically, a human reading Text would infer that Hypothesis is most likely true.

Text (Premise): I’m not sure what the overnight low was Hypothesis: I don't know how cold it got last night. {entailment, contradiction, neutral}

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

tutorial

Bowman and Zhu, NAACL 2019

Textual Entailment (RTE),

Natural Language Inference (NLI)

NLI datasets widely used to measure quality of models.

To perform well, models need to tackle several linguistic phenomena:

–

Lexical entailment (cat vs. animal, cat vs. dog)

Quantiﬁcation (all, most, fewer than eight) –

–

Lexical ambiguity and scope ambiguity (bank, ...) –

Modality (might, should, ...) –

Common sense background knowledge

–...

Compositional interpretation without grounding.

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

Textual Entailment (RTE),

Natural Language Inference (NLI)

NLI datasets widely used to measure quality of models.

To perform well, models need to tackle several linguistic phenomena:

–

Lexical entailment (cat vs. animal, cat vs. dog)

–

Quantiﬁcation (all, most, fewer than eight)

Lexical ambiguity and scope ambiguity (bank, ...) –

Modality (might, should, ...) –

Common sense background knowledge

–...

Compositional interpretation without grounding.

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

–

Textual Entailment (RTE),

Natural Language Inference (NLI)

Common tasks can be cast as entailment premise-hypothesis pairs:

● Information Extraction: Given a text (premise), check whether it entails a relation (hypothesis)

● Question Answering: given a question (premise) identify a text that entails an answer (hypothesis)

● Information Retrieval: Given a query (hypothesis) identify texts that entail the query (premise)

● Summarization ...

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

Textual Entailment (RTE), Natural Language Inference (NLI)

Datasets:

● RTE 1-7 (Dagan et al. 2006-2012)

Premises (texts) drawn from naturally occurring text.

Expert-constructed hypotheses. 5000 examples.

● SNLI, MultiNLI (Bowman et al. 2015; Williams et al. 2017)

Crowdsourcers provided hypothesis for captions.

MultiNLI extended to other genres.

1 million examples.

– Biases in hypotheses (Gururangan et al., 2018; Poliak et al., 2018)

– Data generation with naıve annotators (Geva et al. 2019), artefacts

● FEVER-NLI (Nie et al. 2019)

Fact verification dataset.

● ANLI: (Nie et al. 2012)

Manually created adversarial examples.

200,000 examples.

168,000 examples.

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

100

Textual Entailment (RTE), Natural Language Inference (NLI)

Finetune MLM on NLI

Entailment = 72% Contradiction = 12% Neutral = 16%

(Devlin et al.

2019)

[CLS]

Premise [SEP] Hypothesis [SEP]

Textual Entailment (RTE), Natural Language Inference (NLI)

GPT-3 using prompts

Premise

Hypothesis

Language Models are Few-Shot Learners (Brown et al. 2020)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

116

Textual Entailment (RTE), Natural Language Inference (NLI)

GPT-3 using prompts

Language Models are Few-Shot Learners (Brown et al. 2020)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

117

Label

Textual Entailment (RTE),

Natural Language Inference (NLI)

GPT-3 using prompts fails

“These results on both RTE and ANLI suggest Also confirmed for PaLM 540B (Chowdhery et al. 2022)

Language Models are Few-Shot Learners (Brown et al. 2020)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

124

Textual Entailment (RTE),

Natural Language Inference (NLI)

GPT3 using prompts fails

“These results on both RTE and ANLI suggest that NLI is still a very difficult task for language models”

Language Models are Few-Shot Learners (Brown et al. 2020)

Also confirmed for InstructGPT3 and PaLM 540B

● Results of PaLM only improved when fine-tuning on NLI data

PaLM: Scaling Language Modeling with Pathways (Chowderhy et al. 2022)

Few-shot IE:

Agirre

pre-train, prompt, entail – Eneko

125

Textual Entailment (RTE),

Natural Language Inference (NLI)

GPT-3 using prompts fails

Diagnostic NLI dataset:

“These results on both RTE and ANLI suggest

Language Models are Few-Shot Learners (Brown et al. 2020)

(Wang et al., 2019) Also used at SuperGlue leaderboard

128

Textual Entailment (RTE),

Natural Language Inference (NLI)

GPT-3 using prompts fails

Diagnostic NLI dataset:

Double Negation: 0.0

Morphological Negation: 0.0

Anaphora/Coreference: 1.7

Nominalization: 2.6

Downward Monotone: 3.6

Conjuction: 4.0

Existential: 6.1

Quantifiers: 59.5

Restrictivity: 48.5

Intersectivity: 41.4

Universal: 39.6

“These results on both RTE and ANLI suggest that NLI is still a very difficult task for language models”

Active/Passive: 34.5

Knowledge: 32.0

Language Models are Few-Shot Learners (Brown et al. 2020)

World Knowledge: 33.0

Disjunction: 7.4

Logic: 10.6

Negation: 11.6

Temporal: 12.4

Factivity: 31.6

Also confirmed for PaLM 540B (Chowdhery et al. 2022)

Lexical Semantics: 30.0

Common Sense: 28.4

Matthew Correlation Score, from SuperGlue leaderboard

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

129

Overcoming limitations of LM

LMs fail on many inferences in NLI datasets

Our hypothesis:

Fine-tuning LMs on NLI datasets allow LMs to learn certain inferences ...

… which the LMs will apply on target tasks

Entailment as Few-Shot Learner (Wang et al. 2021)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

139

Plan for this session

● Pre-trained LM

● Prompting

● Entailment

● Few-shot Information

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

140

Extraction

Few-shot Information Extraction?

Our proposal:

● Use “smaller” language models

● Additional pre-training with NLI datasets => Entailment Models

● Recast IE tasks into text-hypothesis pairs

● Run entailment model (zero-shot)

● Fine-tune entailment model (few-shot, full train)

We will examine our work on:

● Relation extraction (Sainz et al 2021, EMNLP)

● Event-argument extraction (Sainz et al. 2022, NAACL findings)

● Several IE tasks (Sainz et al. 2022, NAACL demo)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

141

Few-shot Information Extraction?

Our proposal:

● Use “smaller” language models

● Additional pre-training with NLI datasets => Entailment Models

● Recast IE tasks into text-hypothesis pairs

● Run entailment model (zero-shot)

● Fine-tune entailment model (few-shot, full train)

We will present our work on:

● Relation extraction (Sainz et al 2021, EMNLP)

● Event-argument extraction (Sainz et al. 2022, NAACL findings)

● Several IE tasks (Sainz et al. 2022, NAACL demo)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

142

Entailment for prompt-based Relation Extraction (Sainz et al 2021, EMNLP)

Given 2 entities e1 and e2 and a context c, predict the schema relation (if any) holding between the two entities in the context. Billy Mays

Billy Mays, the bearded, boisterous pitchman who, as the undisputed king of TV yell and sell, became an unlikely pop culture icon, died at his home in Tampa, Fla, on Sunday.

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

PERSON, TampaCITY per:city_of_death

147

Entailment for prompt-based Relation Extraction

Given 2 entities e1 and e2 and a context c, predict the schema relation (if any) holding between the two entities in the context.

Billy Mays, the bearded, boisterous pitchman who, as the undisputed king of TV yell and sell, became an unlikely pop culture icon, died at his home in Tampa, Fla, on Sunday.

Verbalizer

e1 died in e2

template per:city_of_death

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

Billy MaysPERSON, TampaCITY

149

Entailment for prompt-based Relation Extraction

Given 2 entities e1 and e2 and a context c, predict the schema relation (if any) holding between the two entities in the context.

text:

Billy Mays, the bearded, boisterous pitchman who, as the undisputed king of TV yell and sell, became an unlikely pop culture icon, died at his home in Tampa, Fla, on Sunday.

Verbalizer

hypothesis:

e1 died in e2

template per:city_of_death

Billy Mays died in Tampa.

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

Billy MaysPERSON, TampaCITY

150

Entailment for prompt-based Relation Extraction

Given 2 entities e1 and e2 and a context c, predict the schema relation (if any) holding between the two entities in the context.

text:

Billy

Verbalizer

e1 died in e2

template per:city_of_death

Billy Mays, the bearded, boisterous pitchman who, as the undisputed king of TV yell and sell, became an unlikely pop culture icon, died at his home in Tampa, Fla, on Sunday.

hypothesis:

Billy Mays died in Tampa.

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

CITY

MaysPERSON, Tampa

151

Entailment for prompt-based Relation Extraction

Given 2 entities e1 and e2 and a context c, predict the schema relation (if any) holding between the two entities in the context.

text:

Billy Mays

Billy Mays, the bearded, boisterous pitchman who, as the undisputed king of TV yell and sell, became an unlikely pop culture icon, died at his home in Tampa, Fla, on Sunday.

Run entailment model

Verbalizer

hypothesis:

e1 died in e2

template per:city_of_death

Billy Mays died in Tampa.

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

CITY

PERSON, Tampa

152

N C

E N

Entailment for prompt-based Relation Extraction

Given 2 entities e1 and e2 and a context c, predict the schema relation (if any) holding between the two entities in the context.

text:

Billy Mays

Billy Mays, the bearded, boisterous pitchman who, as the undisputed king of TV yell and sell, became an unlikely pop culture icon, died at his home in Tampa, Fla, on Sunday.

Verbalizer

hypothesis:

e1 died in e2

template per:city_of_death

Billy Mays died in Tampa.

per:city_of_death

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

CITY

PERSON, Tampa

153

Entailment for prompt-based Relation Extraction

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

154

Entailment for prompt-based Relation Extraction

Verbalizer

Billy MaysPERSON, TampaCITY

Hypothesis:

Billy Mays was born in Tampa.

Billy Mays’s birthday is on Tampa.

Billy Mays is Tampa years old.

. . .

Billy Mays died in Tampa.

Few-shot IE:

– Eneko Agirre

pre-train, prompt, entail

155

Entailment for prompt-based Relation Extraction

Next, we compute the entailment probabilities for each of the hypothesis independently.

NLI Model

Billy Mays, the bearded, boisterous pitchman who, as the undisputed king of TV yell and sell, became an unlikely pop culture icon, died at his home in Tampa, Fla, on Sunday.

E N C

Billy Mays’s birthday is on Tampa. Billy Mays is Tampa years old. Billy Mays died in Tampa.

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

E N C E N C

157

Entailment for prompt-based Relation Extraction

● We compute the probability of relation r based on the hypothesis probabilities and entity constraints:

● The δr function describes the entity constraints of the relation r:

Relation probability inference

Few-shot IE: pre-train, prompt, entail – Eneko Agirre Pr (x, Billy Mays, Tampa) = δr(PERSON, CITY) max( ) ∈ Tr

158

for prompt-based Relation Extraction

Verbalizer

Billy MaysPERSON, TampaCITY

NLI Model

Billy Mays, the bearded, boisterous pitchman who, as the undisputed king of TV yell and sell, became an unlikely pop culture icon, died at his home in Tampa, Fla, on Sunday.

Hypothesis:

Billy Mays was born in Tampa

Billy Mays’s birthday is on Tampa.

Billy Mays is Tampa years old.

. . .

Billy Mays died in Tampa.

Billy Mays’s birthday is on Tampa Billy Mays is Tampa years old. Billy Mays died in Tampa

Relation probability inference

Few-shot IE: pre-train, prompt, entail – Eneko Agirre Pr (x, Billy Mays, Tampa) = δr(PERSON, CITY) max( ) ∈ Tr

159 E N C E N C

Entailment

E N C

Entailment for prompt-based Relation Extraction

Hypothesis:

Billy Mays, the bearded, boisterous pitchman who, as the undisputed king of TV yell and sell, became an unlikely pop culture icon, died at his home in Tampa, Fla, on Sunday.

Billy Mays was born in Tampa.

Billy Mays’s birthday is on Tampa

Billy Mays is Tampa years old.

Billy Mays died in Tampa

Finally, we return the relation with the highest probability:

If none of the relations is entailed, then r = no_relation

pre-train, prompt, entail

Few-shot IE:

– Eneko Agirre E N E N

Billy MaysPERSON, TampaCITY Verbalizer

E N C NLI

Pr (x, Billy Mays, Tampa) = δr(PERSON, CITY) max( ) ∈ Tr

probability inference

Billy Mays died in Tampa

Model

Relation

160

Entailment for prompt-based Relation Extraction

Hypothesis:

Billy Mays, the bearded, boisterous pitchman who, as the undisputed king of TV yell and sell, became an unlikely pop culture icon, died at his home in Tampa, Fla, on Sunday.

Billy Mays was born in Tampa.

Billy Mays’s birthday is on Tampa

Billy Mays is Tampa years old.

Billy Mays died in Tampa

Finally, we return the relation with the highest probability:

If none of the relations is entailed, then r = no_relation

pre-train, prompt, entail

Few-shot IE:

–

Agirre E N E N

Eneko

Billy MaysPERSON, TampaCITY Verbalizer

E N C NLI

Pr (x, Billy Mays, Tampa) = δr(PERSON, CITY) max( ) ∈ Tr

probability inference

Billy Mays died in Tampa

Model

Relation

161 ZERO-SHOT

Fine-tuning with prompted Relation Extraction dataset

Examples with relation:

per:city_of_death

Context

Billy MaysPERSON, TampaCITY

Examples with no relation:

no_relation

Context

BaldwinPERSON, executiveTITLE

Context. [SEP] Billy Mays died in Tampa.

Context. [SEP] Billy Mays was born in Tampa.

Context. [SEP] Baldwin is a executive.

Few-shot

Eneko Agirre

IE: pre-train, prompt, entail –

162

Fine-tuning with prompted Relation Extraction dataset

Examples with relation:

per:city_of_death

Context

Billy MaysPERSON, TampaCITY

Examples with no relation:

no_relation

Context

BaldwinPERSON, executiveTITLE

Fine-tune MLM with prompted examples

Context. [SEP] Billy Mays died in Tampa.

Context. [SEP] Billy Mays was born in Tampa.

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

163

Fine-tuning with prompted Relation Extraction dataset

Examples with relation:

per:city_of_death

Context

Billy MaysPERSON, TampaCITY

Examples with no relation:

no_relation

Context

BaldwinPERSON, executiveTITLE

Fine-tune MLM with prompted examples

Context. [SEP] Billy Mays died in Tampa.

Context. [SEP] Billy Mays was born in Tampa.

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

164

FEW-SHOT

Evaluation dataset

TACRED (Zhang et al., 2017), based on TAC

41 relation labels (positive), no relation (negative).

Training:

● Zero-shot: 0 examples

● Few-shot: –

5 examples per class (1%) –

23 examples per class (5%) –

46 examples per class (10%)

● Full-train: 460 examples per class

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

165

Evaluation: zero-shot

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

167

Evaluation: zero-shot

Zero-Shot relation extraction:

● Best results with DeBERTa

● Note that minor variations in MNLI (±2) produce large variations in F1.

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

168

Evaluation: few-shot

Few-Shot relation extraction:

● State of the art systems have difficulties to learn the task

● Smaller than our zero-shot system (F1 57)

● Our systems large improvements over SOTA systems. 1% > 10%

● DeBERTa model score the best.

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

170

Evaluation: few-shot

Few-Shot relation extraction:

● State of the art systems have difficulties to learn the task where very small amount of data is annotated.

● Our systems large improvements over SOTA systems. 1% > 10%

● DeBERTa models score the best.

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

171

Entailment for prompt-based Event Argument Extraction (Sainz et al. 2022, NAACL)

Given the success on Relation

Extraction, we extended the work:

● Check Event Argument Extraction

● Transfer knowledge across event schemas

● Measure effect of different NLI datasets

● Measure domain-expert hours

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

172

Entailment for prompt-based Event Argument Extraction

Given event e and argument candidate a and a context c, predict the argument relation (if any) holding between the event and candidate in the context.

Few-shot IE: pre-train, prompt, entail – Eneko Agirre 173

hiredSTART-POSITION, John D. IdolPERSON Start-Position:Person In 1997,

D. Idol to take over as

executive.

the company hired John

chief

Entailment for prompt-based Event Argument Extraction

Given event e and argument candidate a and a context c, predict the argument relation (if any) holding between the event and candidate in the context.

text:

hired

In 1997, the company hired John D. Idol to take over as chief executive.

Verbalizer

a was hired template start-position:person

Few-shot IE: pre-train, prompt, entail – Eneko Agirre 174

START-POSITION

PERSON

, John D. Idol

Entailment for prompt-based Event Argument Extraction

Given event e and argument candidate a and a context c, predict the argument relation (if any) holding between the event and candidate in the context.

text:

hired

In 1997, the company hired John D. Idol to take over as chief executive.

Verbalizer

hypothesis: a was hired

John D. Idol was hired.

template start-position:person

Few-shot IE: pre-train, prompt, entail – Eneko Agirre 175

START-POSITION,

PERSON

John D. Idol

Entailment for prompt-based Event Argument Extraction

Given event e and argument candidate a and a context c, predict the argument relation (if any) holding between the event and candidate in the context.

text:

hired

, John D. Idol

In 1997, the company hired John D. Idol to take over as chief executive.

Verbalizer

hypothesis: a was hired

John D. Idol was hired.

template start-position:person

Few-shot IE: pre-train, prompt, entail – Eneko Agirre 176

START-POSITION

PERSON

Entailment for prompt-based Event Argument Extraction

Given event e and argument candidate a and a context c, predict the argument relation (if any) holding between the event and candidate in the context.

text:

hired

, John D. Idol

In 1997, the company hired John D. Idol to take over as chief executive.

Verbalizer

hypothesis: a was hired

John D. Idol was hired.

template start-position:person

Few-shot IE: pre-train, prompt, entail – Eneko Agirre 177

START-POSITION

PERSON

E N C E N C E N C

Run entailment model

Entailment for prompt-based Event Argument Extraction

Given event e and argument candidate a and a context c, predict the argument relation (if any) holding between the event and candidate in the context.

text:

hired

, John D. Idol

In 1997, the company hired John D. Idol to take over as chief executive.

Verbalizer

hypothesis: a was hired

John D. Idol was hired.

template start-position:person

start-position:person

Few-shot IE: pre-train, prompt, entail – Eneko Agirre 178

START-POSITION

PERSON

Entailment for prompt-based Event Argument Extraction

Given event e and argument candidate a and a context c, predict the argument relation (if any) holding between the event and candidate in the context.

Buyer

John D. Idol bought something.

Person

John D. Idol was hired.

Entity

John D. Idol hired someone.

Vehicle

John D. Idol was used as a vehicle.

Defendant

John D. Idol was the defendant.

Few-shot IE: pre-train, prompt, entail – Eneko Agirre 179

START-POSITION,

D. IdolPERSON

hired

John

Evaluation datasets

ACE (Walker et al., 2006). 22 arg. types.

WikiEvents (Li et al., 2021). 59 arg. types.

Training ( ACE / Wikievents):

● Zero-shot: 0 examples

● Few-shot: 11 / 4 examples per class (5%)

● Full-train: 220 / 80 examples per class (100%)

Few-shot IE: pre-train, prompt, entail

Eneko Agirre

–

187

Evaluation: ACE and Wikievents

● EM is a fine-tuned RoBERTa (strong baseline)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

188

Evaluation: ACE and Wikievents

● NLI is our entailment-based system (RoBERTa)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

189

Transfer between schemas works!

● NLI+: pre-train also on examples from other schema (Wikievents or ACE respectively)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

190

Transfer between schemas works!

● NLI+: pre-train also on examples from other schema (Wikievents or ACE respectively)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

191

State of the art

● We beat SOTA with NLI. Further improvement with NLI+

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

192

State of the art

● We beat SOTA with NLI. Further improvement with NLI+

● NLI+ matches full-train SOTA with only 5% of the annotations

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

193

State of the art

● We beat SOTA with NLI. Further improvement with NLI+

● NLI+ matches full-train SOTA with only 5% of the annotations

20 x reduction of annotated examples with respect to SOTA

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

194

The more NLI pre-training the better

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

196

The more NLI pre-training the better

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

197

The more NLI pre-training the better

Combining

NLI training datasets helps (also in TACRED)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

198

Is this because of a brilliant domain-expert?

● We gave the task to a computational linguist PhD

● Very similar results across all training regimes

● Replicable, robust to variations in prompts

● She also found writing prompts very friendly: “Writing templates is more natural and rewarding than annotating examples, which is more repetitive, stressful and tiresome.”

“When writing templates, I was thinking in an abstract manner, trying to find generalizations. When doing annotation I was paying attention to concrete cases.”

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

199

Is this because of a brilliant domain-expert?

● We gave the task to a computational linguist PhD

● Very similar results across all training regimes

● Replicable, robust to variations in prompts

● She also found writing prompts very friendly: “Writing templates is more natural and rewarding than annotating examples, which is more repetitive, stressful and tiresome.”

“When writing templates, I was thinking in an abstract manner, trying to find generalizations. When doing annotation I was paying attention to concrete cases.”

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

200

What is the manual cost of prompts compared to annotation

● Time devoted by domain-expert in template writing:

● Max. 15 minutes per argument

● ACE: 5 hours for 22 argument types

● WikiEvents: 12 hours for 59 argument types

● Estimate of time by domain-expert for annotation:

● ACE: 180 hours for whole dataset (16,500 examples)

● Severe under-estimation: no quality control, no team, speedy annotation requested

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

201

What is the manual cost of prompts compared to annotation

Two frameworks, 9 hours of domain-expert effort (ACE):

1) Define, annotate, train: annotate 850 ex. (9h, 5%)

2) Verbalize: prompts (5h), annotate 350 ex. (4h, 2%)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

202

What is the manual cost of prompts compared to annotation

Two frameworks, 9 hours of domain-expert effort (ACE):

1) Define, annotate, train: annotate 850 ex. (9h, 5%)

2) Verbalize: prompts (5h), annotate 350 ex. (4h, 2%)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

203 9

What is the manual cost of prompts compared to annotation

Two frameworks, 23 hours of domain-expert effort (ACE):

1) Define, annotate, train: annotate (23h, 13%)

2) Verbalize: prompts (5h), annotate (18h, 10%)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

204

What is the manual cost of prompts compared to annotation

Two frameworks, 23 hours of domain-expert effort (ACE):

1) Define, annotate, train: annotate (23h, 13%)

2) Verbalize: prompts (5h), annotate (18h, 10%)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

205 23

What is the manual cost of prompts compared to annotation

Two frameworks, 23 hours of domain-expert effort (ACE):

1) Define, annotate, train: annotate 13%

With 23 hours (10% train), our entailment model matches a fine-tuned model costing at least 180 hours (full-train)

2) Verbalize while defining: prompts (5h), ann. 10% (18h)

Same amount of parameters

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

206

Conclusions for prompt-based extraction using entailment

● Very effective for zero- and few-shot IE

● Allows for transfer across schemas (for the first time)

● At least 8 x less effort for domain expert

● It is now feasible to build an IE system from scratch with limited effort –

Develop schema and verbalization at the same time –

Verbalize then annotate a few examples

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

210

Verbalize while defining, interactive workflow (Sainz et al. 2022, NAACL demo)

1) Domain expert defines entities and relations in English

2) Runs the definitions on examples

3) Annotates a handful of incorrect examples

4) Iterate!

● User interface for NERC, RE, EE, EAE

● 2 minute video

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

217

Verbalize while defining, interactive workflow (Sainz et al. 2022, NAACL demo)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

219

Verbalize while defining, interactive workflow (Sainz et al. 2022, NAACL demo)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

220

Verbalize while defining, interactive workflow (Sainz et al. 2022, NAACL demo)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

221

Verbalize while defining, interactive workflow (Sainz et al. 2022, NAACL demo)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

222

Verbalize while defining, interactive workflow (Sainz et al. 2022, NAACL demo)

223

Verbalize while defining, interactive workflow

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

224

Verbalize while defining, interactive workflow (Sainz et al. 2022, NAACL demo)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

226

Verbalize while defining, interactive workflow (Sainz et al. 2022, NAACL demo)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

227

Verbalize while defining, interactive workflow (Sainz et al. 2022, NAACL demo)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

228

Verbalize while defining, interactive workflow (Sainz et al. 2022, NAACL demo)

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

229

Verbalize while defining, interactive workflow (Sainz et al. 2022, NAACL demo)

230

Plan for this session

● Pre-trained LM

● Prompting

● Entailment

● Few-shot Information Extraction

● Conclusions

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

231

Conclusions

● Pre-train, prompt and entail works

– Using “smaller” LMs

● Few-shot Information Extraction is here

● Verbalize while defining, interactive workflow

– Domain expert defines entities and relations in English

– Runs the definitions on examples

– Annotates a handful of incorrect examples, iterates

● Lower cost for building IE applications

● Friendlier to domain-experts

● Slides in my website, code at:

https://github.com/osainz59/Ask2Transformers

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

232

Conclusions

● Pre-train, prompt and entail works

– Using “smaller” LMs

● Few-shot Information Extraction is here

● Verbalize while defining, interactive workflow

– Domain expert defines entities and relations in English

– Runs the definitions on examples

– Annotates a handful of incorrect examples, iterates

● Lower cost for building IE applications

● Friendlier to domain-experts

● Slides in my website, code at:

https://github.com/osainz59/Ask2Transformers

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

233

Conclusions

● Pre-train, prompt and entail works

– Using “smaller” LMs

● Few-shot Information Extraction is here

● Verbalize while defining, interactive workflow

– Domain expert defines entities and relations in English

– Runs the definitions on examples

– Annotates a handful of incorrect examples, iterates

● Lower cost for building IE applications

● Friendlier to domain-experts

● Slides in my website, code at:

https://github.com/osainz59/Ask2Transformers

Few-shot IE: pre-train, prompt, entail – Eneko Agirre

234

Ongoing work

● Verbalize while defining, interactive workflow

– Check real use-cases (e.g analysts BETTER program)

● Pre-train, prompt and entail works

– Check tasks beyond IE –

Compare head-to-head to plain LM (PET) and QA

● Beyond, DL – reasoning research

– Identify useful inferences to extend NLI datasets

–

Entailment as a method to teach inference to LM

Few-shot

IE: pre-train, prompt, entail – Eneko Agirre

235

Few-shot Information Extraction

Pre-train, Prompt, Entail

Basque Center for Language Technology (UPV/EHU)

@eagirre

https://hitz.eus/eneko/

https://github.com/osainz59/Ask2Transformers

Relation extraction (Sainz et al 2021, EMNLP)

Event-argument extraction (Sainz et al. 2022, NAACL findings)

Several IE tasks (Sainz et al. 2022, NAACL demo)

Few-shot IE: pre-train,

Eneko Agirre

prompt, entail –

Eneko Agirre Director of HiTZ

THANKS!