ERPA publikációk

ERPA Cikkek

Analysis of the maximal pattern mining method and its variants
In this paper, within the framework of process mining we examine the Maximal Pattern Mining method introduced by Liesaputra et al. in [1]. This method constructs a transition graph, i.e. a labelled directed graph for traces with similar structure. The idea behind the algorithm is to analyze the traces in the event log, identify loops, parallel events and optionality between them, in order to determine the maximal patterns. In [1], the authors provide a pseudo code for the skeleton of their algorithm and discuss some parts, but other parts are not detailed. Here, we briefly discuss the steps of the algorithm and elaborate the steps that are not explained in [1]. We introduce some new subroutines to handle the loops, parallel and optional sequences.

ERPA Cikkek

Application of deep learning algorithms detecting fake and correct textual or verbal news
The ongoing spread and expansion of information technology and social media sites has made it easier for people to access different types of news - political, economic, medical, social etc. - through these platforms. This rapid growth in news outlets and the demand for information has blurred the lines between real and fake news, and led to the dissemination of fake news, which is a dangerous state of affairs. The outbreak of the coronavirus pandemic and a rising awareness of the dangers posed all across the globe saw a parallel rise in fake news and rumors, as like as unsubstantiated statements and deceptive ideas. The main aim of this study is supposed to set out to overcome these kind of problems in the future, with application of deep learning algorithms (LSTM, Bi-LSTM, BERT), using a large dataset (39279 rows) to identify fake and correct textual or verbal news. The results of the deep learning application using different algorithms show that the BERT model performed the best, achieving a text classification accuracy of 96.63 %.

ERPA Cikkek

Performance analysis of low dimensional word embeddings to support green computing
It has become increasingly important to pay attention how much energy we use to operate various Artificial Intelligence (AI) and Machine Learning (ML) systems. In order to implement environmentally responsible solutions we need to reconsider our used storage resources and computational power. Training a natural language model is a time and energy demanding process. In recent years the language models are becoming extremely large and the trend is growing. The building process of these models are consuming an extremely large amount of computational power hence these demands huge amounts of energy. In our research we trained and evaluated low dimensional word2vec embedding models and analyzed their performance on building transition based dependency parsers to show that low dimensional models are still competitive and in many use cases may be sufficient.

ERPA Cikkek

Event sequence segmentation for parallel processes
The robotic process mining focuses on the analysis of historical process sequences in order to build up a process model for the investigated field. One of the main tasks in robotic process mining is the construction of process schema for the input sequences. Usual methods are able to generate models using only baseline graph structures. In order to support high level structures like parallelism, the input event sequence structure must support additional attributes on the events. This paper presents a novel approach on sequence segmentation providing an intermediate graph structure which can be used to mine complex graph patterns. The tested prototype system contains a Python-based implementation of the proposed algorithm. In the paper, some tests are shown to illustrate the suitability of the proposed model.

ERPA Cikkek

Activity logs in practice
Modern information technology is now present virtually everywhere, in all areas. For the increasingly complex processes, complex information systems are developed that can be used to provide effective support for the processes. There is a lot of data owing through information systems that is now essential to examine. RPA offers a solution for this, which allows partial or even complete automation of processes. One of the important basic units of RPA may be the activity logs generated in practice. In this publication, this area is reviewed. The most important formats are briefly presented, followed by a runtime model whose aim is to hide the differences in the formats, to achieve a general structure. Building on this module, an MLP model that implements the prediction of atomic events is finally presented. The publication approaches the problem from a practical point of view and proves the effectiveness of the model with test results.

ERPA Cikkek

Conversion of customer service event logs to standard formats
The act of logging the events (transactions, errors, intrusions, etc.) happening within an information system is about the same age as the system itself. Mining these historical records, however, is a recent demand to support robotic process automation initiatives. Our goal is to create an RPA solution for heavily overloaded customer services and we now face the problem of getting logs with different syntax and structure. This paper presents the standard event log formats and reviews the steps of transforming the most frequent non-standard log formats into a uniform formalism.

ERPA Cikkek

Process Mining of Parallel Sequences with Neural Network Technologies
Process Mining is an important tool for automatic discovery of workflow process schemes. Dominating process mining technologies use either automaton-based engines or neural network engines. The main benefits of the machine learning based methods are the time and scale efficiency, but they have still some limitations considering schema flexibility. The paper introduces a novel approach for mining parallel sequences which is a hard problem for current neural network engines. The performed analysis and test results show that the proposed model is able to induce good quality schema, in many cases in better quality than the base methods.