Welcome to ULITE workshop at JCDL 2022
Understanding literature references is crucial for the process of maintaining access to previous research as well as gaining new insights from the structure and content that published scientific papers build on.
With the growing number of publications and academic full texts, curating databases of literature manually becomes impossible. By automatic extraction and parsing of references we release the research community from this burden and create different opportunities for deeper analysis of literature. One aspect of this is understanding of publication history: how authors influence each other, which institutions are involved in certain fields and topics and so on. One way to achieve this is to build a reference graph using the citation information inside the papers. If such a system exists, one can easily access information about further work of a particular author on the topic as well as related work, location of relevant datasets and which other projects the institution is pursuing.
It is worth mentioning that while in the field of Computer Science and some other domains multiple initiatives and infrastructures for automatic processing of scientific literature exist (Semantic Scholar, Google Scholar, AMiner, CiteSeerX, etc.), for many other fields the situation is drastically different. Social sciences, law, and history (to name a few) scholars often have to rely on fragmented data sources of full-text documents to navigate their fields.
Call for papers
The workshop on Understanding Literature References in Academic Full Text (ULITE), co-located with JCDL 2022, invites submissions on the topics of processing of literature references.
We invite submissions on the following and related topics (but are not limited to):
- Reference extraction from full texts (e.g. Hosseini et al., 2019)
- Reference segmentation/parsing (e.g. Boukhers et al., 2019)
- Reference type detection
- Reference deduplication
- Entity identification, disambiguation and linking (authors, affiliations, publication outlets, locations, etc.) (Backes, 2018)
- Understanding references to datasets and software
- Datasets and testbeds of reference data
- Identification and resolution of “non-source items” (Chi, 2014) aka matching references outside of a given collection (open-world assumption)
- Open infrastructures and services for reference mining
- Large-Scale matching to (open) infrastructures
- Domain-specific approaches and applications
- Search, exploration and mining of the reference graph
All deadlines are 11:59 pm UTC -12h (“anywhere on Earth”).
- Deadline for submission: May 2, 2022
- Notification of acceptance: May 30, 2022
- Camera ready: June 10, 2022
- Workshop: June 24, 2022
Regular papers: All submissions must be written in English, following the CEUR Proceedings template (10 pages for full papers and 4 pages for short papers exclusive of unlimited pages for references) and should be submitted as PDF files to EasyChair (see below).
Poster & demonstration: We welcome submissions detailing original, early findings, works in progress and industrial applications of knowledge entities extraction ande evaluation for a special poster session, possibly with a 2-minute presentation in the main session. Some research track papers will also be invited to the poster track instead, although there will be no difference in the final proceedings between poster and research track submissions. These papers should follow the same format as the research track papers but can be shorter (2 pages for poster and demo papers).
CEURART (incl. LaTeX and Word templates) https://ceurws.wordpress.com/2020/03/31/ceurws-publishes-ceurart-paper-style/
Submit via EasyChair: https://easychair.org/conferences/?conf=ulite2022
Peer Review Progress and Workshop Format
Our peer review process will be supported by Easychair. Each submission is assigned to 2 to 3 reviewers, preferably at least one expert in Bibliometrics and one expert in NLP or Information Extraction. The programme committee will consist of peer reviewers from all participating communities. Accepted papers are either long papers (15-minute talks) or short papers (5-minute talks).
The workshop is planned to last half a day and will include a keynote, invited papers, paper presentations and a closing panel discussion on the needs of the different stakeholders involved in the field of reference understanding. The expected number of participants is approx. 40 people and we plan the event as an on-site event. If the pandemic situation still holds we will switch to an online format (Nunes et al., 2020). A typical seminar-style room with a projector for 30-50 delegates would be required. A/V capacities would be desirable to potentially invite speakers to give an online presentation. For the on-site poster session, we would require a small number of poster boards. The workshop would follow the general JCDL format (this year: online).
Objectives and Target Audience
The goal of this workshop is to engage communities interested in the broad topic of literature reference understanding and automatic processing of scientific fulltext publications. Our workshop has a focus on working with open infrastructures/tools and offering the extracted information as open data for reuse. Our view is to expose people from one community to the work of the respective other community and to foster fruitful interaction across communities.
The target audience of the workshop are researchers and practitioners, junior and senior, from Natural Language Processing (NLP) and Information Extraction as well as Information Retrieval and Bibliometrics/Scientometrics. These could be IR/NLP researchers interested in potential new application areas for their work as well as researchers and practitioners working with bibliometric data and interested in how IR/NLP methods can make use of such data.
We particularly welcome submissions (research paper and technical contributions like) that address the topics for underrepresented domains such as, references in languages other than English and from fields other than Computer Science.
Some of the expected outcomes of the workshop include:
- Identification of under researched topics
- Identification of weaknesses of current approaches (especially when generalized to new domains)
- Workshop Proceedings are planned to be published with CEUR
Keynotes and invited speakers
- Silvio Peroni (accepted). Silvio Peroni holds a Ph.D. degree in Computer Science and he is an Associate Professor at the Department of Classical Philology and Italian Studies, University of Bologna. He is an expert in document markup and semantic descriptions of bibliographic entities using Semantic Web technologies. In his keynote he will talk about his project Open Citations
All questions about submissions should be emailed to firstname.lastname@example.org.
Previous related workshops
- EXCITE Workshop 2017: “Challenges in Extracting and Managing References”: community meeting/workshop during our precursor project EXCITE held in Cologne with approx. 30 on-site and remote experts. Organizers: Philipp Mayr & Steffen Staab
- Workshop on Open Citations 2018. First Workshop on Open Citations held in Bologna, Italy with approx. 90 on-site participants. Organizers: Silvio Peroni, David Shotton, Philipp Mayr, Steffen Staab et al.
- Workshop on Open Citations And Open Scholarly Metadata 2020 (Online event). Second Workshop on Workshop On Open Citations And Open Scholarly Metadata held online in September 2020 with >100 on-line participants. Organizers: Silvio Peroni, David Shotton, Philipp Mayr, Steffen Staab et al.
Biographies of the Organizers
Anastasiia Iurshina is a PhD student in the Analytic Computing group, University of Stuttgart. She is involved in the OUTCITE project. Her interests include processing of literature references as well as more general natural processing tasks such as entity linking.
Muhammad Ahsan Shahid is a software developer at the GESIS - Leibniz-Institute for the Social Sciences department Knowledge Technologies for the Social Sciences (KTS). He received his Master’s degree in Computer Science from the Technische Universität Berlin. His interests include development, operating data and NLP. He is also a member of the enthusiastic team responsible for OUTCITE.
Tobias Backes is a PhD student at GESIS - Leibniz-Institute for the Social Sciences department Knowledge Technologies for the Social Sciences (KTS). He received his Master’s degree in Language Science and Technology from Saarland University. His interests include entity resolution problems such as author disambiguation, institution resolution and duplicate detection. He is also a software contributor in OUTCITE.
Philipp Mayr is team leader at the GESIS - Leibniz-Institute for the Social Sciences department Knowledge Technologies for the Social Sciences (WTS). He received his PhD in applied informetrics and information retrieval from the Berlin School of Library and Information Science at Humboldt University Berlin. His research group focuses on methods and techniques for interactive information retrieval and data set search. He was the main organizer of the BIR workshops at ECIR 2014-2020 and is the co-PI of the EXCITE and OUTCITE projects.
Steffen Staab is professor for Analytic Computing at University of Stuttgart and professor for Web and Computer Science at University of Southampton. Steffen is a fellow of the European Association for Artificial Intelligence. His research interests lie at the intersection of knowledge graphs and machine learning with scientific knowledge graphs being one of the points where these interests meet.
Hosseini, A., Ghavimi, B., Boukhers, Z., & Mayr, P. (2019). EXCITE - A toolchain to extract, match and publish open literature references. Proceedings of the ACM/IEEE Joint Conference on Digital Libraries 2019, 432–433. https://doi.org/10.1109/JCDL.2019.00105
Z. Boukhers, S. Ambhore and S. Staab, “An End-to-End Approach for Extracting and Segmenting High-Variance References from PDF Documents,” 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2019, pp. 186-195, doi: 10.1109/JCDL.2019.00035.
Chi, P.-S. (2014). Which role do non-source items play in the social sciences? A case study in political science in Germany. Scientometrics, 101(2), 1195–1213. https://doi.org/10.1007/s11192-014-1433-1
Nunes, S., Little, S., Bhatia, S., Boratto, L., Cabanac, G., Campos, R., Couto, F. M., Faralli, S., Frommholz, I., Jatowt, A., Jorge, A., Marras, M., Mayr, P., & Stilo, G. (2020). ECIR 2020 Workshops: Assessing the Impact of Going Online. SIGIR Forum, 54(1). http://sigir.org/wp-content/uploads/2020/06/p09.pdf
Backes, Tobias. 2018. “Effective unsupervised author disambiguation with relative frequencies.” In Proceedings of the 16th ACM/IEEE Joint Conference on Digital Libraries (JCDL’18), edited by Jiangping Chen, Marcos André Gonçalves, and Jeff M. Allen, doi: http://dx.doi.org/10.1145/3197026.3197036.