中文

Dataset Card for Dataset Name

Dataset Summary

This dataset represents a 2022 snapshot of the Supreme Court of Israel public verdicts and decisions supported by rich metadata. The 5.31GB dataset represents 751,194 documents. Overall, the dataset contains 2.68 Gb of text. It can be loaded with the dataset package:

import datasets
data = datasets.load_dataset('LevMuchnik/SupremeCourtOfIsrael')

Supported Tasks and Leaderboards

[More Information Needed]

Languages

The vast majority of the documents in the database are in Hebrew. A small number of documents are in English.

Dataset Structure

The dataset is a json lines file with each line corresponding to a single document and containing document identification, text and metadata.

Data Instances

[More Information Needed]

Data Fields

The file contains the following fields:

  • case_id - running number for cases
  • download_time - when the document was downloaded (datetime)
  • number_of_case_documents - number of documents in the current case
  • file_name - full name of the document file, including relative path
  • Id - document id
  • CaseId - case id
  • VerdictDt - Date of the document (datetime)
  • CreatedDate - Date of when the document was inserted into the Supreme Court database
  • CaseNum - case number
  • CaseDesc - Unique case identifier. This id is used to reference cases within the Israeli legal system
  • Pages - number of pages in the original document
  • Path - relative path to the document
  • CaseName - formal name of the case
  • FileName - document file name, without path
  • DocName -document file name, without path
  • Year - document creation year
  • TypeCode - enumeration of document types (see Type field below)
  • Type - Document type
    • פסק-דין 84339
    • החלטה 663099
    • צו ביניים 22
    • פסקי דין באנגלית 310
    • צו על תנאי 200
    • צו 2606
    • פד"י 302
    • תקצירים 316
  • Technical - boolean indicator of whether the document is technical or not.
  • CodeVolume - ?
  • document_hash - 258-bit hashtag of the document name. Used internally to uniquely identify the document
  • text - text of the document. Multiple newlines and other document formating elements (paragraphs,lists, etc.) are preserved.
  • html_title - document title extracted from the HTML
  • VerdictsDt - date of the verdict
  • meta_case_nm - formal case name,
  • meta_sec_appeal - integer or None
  • meta_side_ty - case type, list of strings
  • meta_verdict_file_nm - name of the verdict file
  • meta_judge - list of names of the cases judges
  • meta_mador_nm - name of the court instance (e.g. בג"ץ)
  • meta_side_nm - list of the case parties, list of strings
  • meta_verdict_dt - date of the verdict
  • meta_case_dt - date of the case
  • meta_verdict_nbr -
  • meta_ProgId - name of the software used to create the document (None, Word, etc)
  • meta_is_technical - whether the document is technical, {'false', 'true'}
  • meta_judge_nm_last - last names of the judges (list of strings)
  • meta_case_nbr - formal number of the case (same as CaseDesc)
  • meta_verdict_ty - type of the decision (same as Type)
  • meta_lawyer_nm - list of lawyer names, list of strings or None
  • meta_judge_nm_first - list of judges' first names, list of strings
  • meta_verdict_pages - number of document cases
  • meta_inyan_nm - court בג"ץ
  • meta_court_nm - court (e.g. בית המשפט העליון )

Data Splits

The entire dataset is qualified as 'train'.

Dataset Creation

2023-04-22

Curation Rationale

[More Information Needed]

Source Data

https://supreme.court.gov.il/

Initial Data Collection and Normalization

The data was colleted by crawling the Israeli Supreme Court website.

Who are the source language producers?

[More Information Needed]

Annotations

Annotation process

[More Information Needed]

Who are the annotators?

[More Information Needed]

Personal and Sensitive Information

The data contained in this dataset is public.

Considerations for Using the Data

Social Impact of Dataset

[More Information Needed]

Discussion of Biases

[More Information Needed]

Other Known Limitations

[More Information Needed]

Additional Information

Dataset Curators

Prof. Lev Muchnik, Hebrew University of Jerusalem Dr. Inbal Yahav Shenberger, Tel Aviv University

Licensing Information

[More Information Needed]

Citation Information

Lev Muchnik, Inbal Yahav, Ariel Nevo, Avichay Chriqui, Tim Shektov, 2023, The Israeli Supreme Court Dataset

Contributions

The authours would like to thank the Israeli Innovation Authority (grants #78560 and #78561) for their support in creating of this dataset.