For detailed citation information please check my Google Scholar page.
2022
Dock2KG: Transforming Document Repositories to Knowledge Graphs
Stylianou, Nikolaos,
Vlachava, Danai,
Konstantinidis, Ioannis,
Bassiliades, Nick,
and Peristeras, Vasileios
International Journal on Semantic Web and Information Systems (IJSWIS)
2022
Document Management Systems (DMS) are used for decades to store large amounts of information in textual form. Their technology paradigm is based on storing vast quantities of textual information enriched with metadata to support searchability. However, this exhibits limitations as it treats textual information as black box and is based exclusively on user-created metadata, a process that suffers from quality and completeness shortcomings. The use of knowledge graphs in DMS can substantially improve searchability, providing the ability to link data and enabling semantic searching. Recent approaches focus on either creating knowledge graphs from document collections or updating existing ones. In this paper, we introduce Doc2KG (Document-to-Knowledge-Graph), an intelligent framework that handles both creation and real-time updating of a knowledge graph, while also exploiting domain-specific ontology standards. We use DIAVGEIA (clarity), an award winning Greek open government portal, as our case-study and discuss new capabilities for the portal by implementing Doc2KG.
2021
CoreLM: Coreference-aware Language Model Fine-Tuning
In Proceedings of the Fourth Workshop on Computational Models of Reference, Anaphora and Coreference
2021
Language Models are the underpin of all modern Natural Language Processing (NLP) tasks. The introduction of the Transformers architecture has contributed significantly into making Language Modeling very effective across many NLP task, leading to significant advancements in the field. However, Transformers come with a big computational cost, which grows quadratically with respect to the input length. This presents a challenge as to understand long texts requires a lot of context. In this paper, we propose a Fine-Tuning framework, named CoreLM, that extends the architecture of current Pretrained Language Models so that they incorporate explicit entity information. By introducing entity representations, we make available information outside the contextual space of the model, which results in a better Language Model for a fraction of the computational cost. We implement our approach using GPT2 and compare the fine-tuned model to the original. Our proposed model achieves a lower Perplexity in GUMBY and LAMBDADA datasets when compared to GPT2 and a fine-tuned version of GPT2 without any changes. We also compare the models’ performance in terms of Accuracy in LAMBADA and Children’s Book Test, with and without the use of model-created coreference annotations.
Improved Biomedical Entity Recognition via Longer Context Modeling
In Artificial Intelligence Applications and Innovations
2021
Biomedical Named Entity Recognition is a difficult task, aimed to identify all named entities in medical literature. The importance of the task becomes apparent as these entities are used to identify key features, enable better search results and can accelerate the process of reviewing related evidence to a medical case. This practice is known as Evidence-Based Medicine (EBM) and is globally used by medical practitioners who do not have the time to read all the latest developments in their respective fields. In this paper we propose a methodology which achieves state-of-the-art results in a plethora of Biomedical Named Entity Recognition datasets, with a lightweight approach that requires minimal training. Our model is end-to-end and capable of efficiently modeling significantly longer sequences than previous models, benefiting from inter-sentence dependencies.
TransforMED: End-to-Εnd Transformers for Evidence-Based Medicine and Argument Mining in medical literature
Journal of Biomedical Informatics
2021
Argument Mining (AM) refers to the task of automatically identifying arguments in a text and finding their relations. In medical literature this is done by identifying Claims and Premises and classifying their relations as either Support or Attack. Evidence-Based Medicine (EBM) refers to the task of identifying all related evidence in medical literature to allow medical practitioners to make informed choices and form accurate treatment plans. This is achieved through the automatic identification of Population, Intervention, Comparator and Outcome entities (PICO) in the literature to limit the collection to only the most relevant documents. In this work, we combine EBM with AM in medical literature to increase the performance of the individual models and create high quality argument graphs, annotated with PICO entities. To that end, we introduce a state-of-the-art EBM model, used to predict the PICO entities and two novel Argument Identification and Argument Relation classification models that utilize the PICO entities to enhance their performance. Our final system works in a pipeline and is able to identify all PICO entities in a medical publication, the arguments presented in them and their relations.
A neural Entity Coreference Resolution review
Expert Systems with Applications
2021
Entity Coreference Resolution is the task of resolving all mentions in a document that refer to the same real world entity and is considered as one of the most difficult tasks in natural language understanding. It is of great importance for downstream natural language processing tasks such as entity linking, machine translation, summarization, chatbots, etc. This work aims to give a detailed review of current progress on solving Coreference Resolution using neural-based approaches. It also provides a detailed appraisal of the datasets and evaluation metrics in the field, as well as the subtask of Pronoun Resolution that has seen various improvements in the recent years. We highlight the advantages and disadvantages of the approaches, the challenges of the task, the lack of agreed-upon standards in the task and propose a way to further expand the boundaries of the field.
2020
E.T.: Entity-Transformers. Coreference augmented Neural Language Model for richer mention representations via Entity-Transformer blocks
In Proceedings of the Third Workshop on Computational Models of Reference, Anaphora and Coreference
2020
In the last decade, the field of Neural Language Modelling has witnessed enormous changes, with the development of novel models through the use of Transformer architectures. However, even these models struggle to model long sequences due to memory constraints and increasing computational complexity. Coreference annotations over the training data can provide context far beyond the modelling limitations of such language models. In this paper we present an extension over the Transformer-block architecture used in neural language models, specifically in GPT2, in order to incorporate entity annotations during training. Our model, GPT2E, extends the Transformer layers architecture of GPT2 to Entity-Transformers, an architecture designed to handle coreference information when present. To that end, we achieve richer representations for entity mentions, with insignificant training cost. We show the comparative model performance between GPT2 and GPT2E in terms of Perplexity on the CoNLL 2012 and LAMBADA datasets as well as the key differences in the entity representations and their effects in downstream tasks such as Named Entity Recognition. Furthermore, our approach can be adopted by the majority of Transformer-based language models.
EBM+: Advancing Evidence-Based Medicine via two level automatic identification of Populations, Interventions, Outcomes in medical literature
Artificial Intelligence in Medicine
2020
Evidence-Based Medicine (EBM) has been an important practice for medical practitioners. However, as the number of medical publications increases dramatically, it is becoming extremely difficult for medical experts to review all the contents available and make an informative treatment plan for their patients. A variety of frameworks, including the PICO framework which is named after its elements (Population, Intervention, Comparison, Outcome), have been developed to enable fine-grained searches, as the first step to faster decision making. In this work, we propose a novel entity recognition system that identifies PICO entities within medical publications and achieves state-of-the-art performance in the task. This is achieved by the combination of four 2D Convolutional Neural Networks (CNNs) for character feature extraction, and a Highway Residual connection to facilitate deep Neural Network architectures. We further introduce a PICO Statement classifier, that identifies sentences that not only contain all PICO entities but also answer questions stated in PICO. To facilitate this task we also introduce a high quality dataset, manually annotated by medical practitioners. With the combination of our proposed PICO Entity Recognizer and PICO Statement classifier we aim to advance EBM and enable its faster and more accurate practice.
2019
A Neural Entity Coreference Resolution Review
2019
2018
Real Time Location Based Sentiment Analysis on Twitter: The AirSent System
In Proceedings of the 10th Hellenic Conference on Artificial Intelligence
2018
The widespread use of Social Media creates great opportunities for businesses to take advantage of. By combining clever techniques, companies can develop powerful data analysis systems to understand their customers. This paper presents a real time sentiment analysis and location inference system, showcased via AirSent, an R-based application designed to assist airline carriers in measuring their passengers’ satisfaction. AirSent can download, classify and locate tweets within seconds, presenting the results in interactive maps.