Ta strona wykorzystuje ciasteczka ("cookies") w celu zapewnienia maksymalnej wygody w korzystaniu z naszego serwisu. Czy wyrażasz na to zgodę?

Czytaj więcej

Materials and Methods

Methods

The study draws on quantitative methods of corpus linguistics, language acquisition methods; and is also informed by terminology research in the fields of translation studies, law and language, as well as drafting and terminological practices.

Materials: Terms

 The study uses two large datasets:

  • Defined terms extracted from EU, UK and Irish legal acts (dataset 1)
  • Terms in the English section of the EU’s IATE term base (dataset 2).

The former reflects the perspective of drafters whereas the latter the perspective of terminologists. The datasets were uploaded to Sketch Engine for analyses.

  • Dataset 1: Defined terms in EU, UK and Irish legal acts

Since automatic term recognition turned out to be ineffective, defined terms — a special category of terms — were extracted from 10-year corpora of legal acts covering the period of 2010-2019. The focus corpus is the EU corpus of English-language directives and regulations. The reference corpora are the Irish corpus of public legal acts (IPA) and the UK corpus of Public General Acts (UKPGA), representing two major English-language jurisdictions in Europe: Ireland and the United Kingdom. The years 2010-2019 were chosen to ensure that the reference countries were part of the EU during the sampling frame.

Details of the corpora.

Corpus # of files Words Terms, types Terms, tokens
REG: REGULATIONS 1040 9168665 7043 5218
DIR: DIRECTIVES 156 2328195 2228 1575
UKPGA: UK Public General Acts 306 3537394 7319 4205
IPA: Irish Public Acts 244 3615306 7018 3879

 

The EU files were downloaded automatically from the EUR-Lex directory of legal acts. The search criteria were limited to basic acts in force as at 2020 and excluded delegated and implementing acts. UK public general acts were downloaded from the UK statute database in the “Original as enacted” version. Irish public acts (Acts of the Oireachtas) were downloaded from the Irish Statute Book (eISB) website. All downloads excluded amending acts. Corpus files were annotated in NotePad++ to separate normative parts (enacting terms) from other sections, i.e. citations, recitals and annexes in the EU corpus and schedules in Irish and UK acts, to ensure better comparability between the corpora. The files were uploaded to Sketch Engine, part-of-speech tagged and lemmatized. Defined terms were extracted from the corpora, using wild cards and filtering with the lemma mean up to the 5th position on the right. Thus, the dataset covers the whole population of defined terms which meet these criteria. Terms were exported to an Excel sheet, verified manually and cleaned.

  • IATE datasets

IATE (Interactive Terminology for Europe, https://iate.europa.eu/home), which is a multilingual interinstitutional terminology database of the European institutions, is one of the world’s largest term bases run in 24 EU official languages. The terms were exported from the English section of publicly available IATE, copied to Excel and semi-automatically cleaned. This resulted in the IATE FULL dataset, composed of 668,980 terms and 541,163 entries.

IATE datasets

  IATE FULL IATE EU IATE EU LAW IATE LAW IATE PRIMARY
English terms 668980 40764 3326 45616 111611
English entries 541163 31115 2333 36659 71345

 

In search for prototypes, four smaller, more controlled datasets were further extracted, ranging from 0.5% to 11% of the full dataset: IATE EU, IATE EU LAW, IATE LAW based on thematic criteria and IATE PRIMARY based on a qualitative criterion. In the case of thematic exports, EuroVoc thematic were used: IATE EU → 10 European Union; IATE EU LAW → 1011 European Union Law, and IATE LAW → 12 Law. These subsets reflect EU terminologists’ perception on which terms qualify as EU terms, EU legal terms and legal terms, respectively. IATE PRIMARY comprises entries marked as ‘primary’, that is those which meet minimum quality standards of information content (about 12% of IATE entries). Primary entries therefore offer a higher-quality snapshot of IATE: entries considered important enough to be more elaborated.