Accepted Papers

  • Data Mining Applications On Transportation, Especially On Multimodal Transportation : A Literature Review
    Aysun MUTLU and Ayda AMNIATTALAB, Sabanci University, Istanbul, Turkey.
    Multimodal transport has entered the transportation sector in order to meet the growing demand for freight transport as well as to be an alternative to one mode freight transport. Multimodal transport, which is a dynamic process, is considered to be the transport of freight from door to door using at least two modes of transport; these modes are usually roads, sea routes and the railway system. Various possible problems of transportation engineering have been researched in the context of data mining; however multimodal transportation is new at the data mining point of view. Data mining currently is hot topic research area and is applied in database, artificial intelligence, statistics, logistics, and so on. It may discover valuable knowledge and the patterns in the large-scale database for users. This paper reviews the literature related to data mining techniques and applications used on transportation and especially on multimodal transportation. In addition to data mining techniques which are rarely utilized on this part of the research studies, we also took into account multi-criteria decision making approaches. Moreover this study is concluded with classification of these solution techniques.
  • Mining Arabic Social Media For Human Rights Abuse
    Ayman Alhelbawy1,2,3, Udo Kruschwitz1 and Massimo Poesio1, 1University of Essex, UK, 2Minority Rights Group, UK, 3Fayoum University, Egypt
    It has become very hard for human rights organisations to get reports from countries like Iraq or Syria, as monitoring the human rights situation through the usual processes is not safe anymore. Mining social media might be a solution to the problem of identifying (potential) human rights abuses safely. In this paper, we present preliminary results on classifying a corpus of violent actions extracted from Arabic Twitter to detect Human Rights Abuse (HRA), a new and challenging task. We show that whereas the commonality in terms between violent actions leads to poor results when using clustering methods, supervised methods can achieve usable results considering some lexical and morphological features.
  • Towards Optimising Feature Extraction for Deep Learning
    Dinesh Kumar and Dharmendra Sharma, University of Canberra, Australia.
    Deep Learning (DL) has recently been gaining increasing research interest and presents much technical challenge and potential. This paper investigates an optimisation model for feature extraction for DL (FEDL) based on existing algorithms - Principal Component Analysis (PCA) and Restricted Boltzman Machine (RBM). FEDL is motivated by computational efficiency and optimisation of outcomes from complex real world datasets. PCA is a linear model and has successfully been developed as a statistical tool for extracting features in traditional analytics. RBM is a new type of machine learning tool with strong power of representation, which has been utilized as the feature extractor in a large variety of classification problems [1]. It is a non-linear model. We aim to find output from which feature extraction algorithm performs better for a deep learning classifier to produce improved outcomes for a complex cancer data set. The experiments show deep learning architectures can handle datasets without feature extraction and produce similar or better results given their complex neuron structures. To compare classification accuracy of deep nets we also injected extracted features from PCA and RBM functions into classifiers K-Nearest Neighbours classifier (KNN) and Support Vector Machines (SVM). First we obtained classification accuracy results by applying the classifiers on raw data using full feature sets as input. Then we filtered the datasets using PCA and RBM algorithms and used the resultant set as input in a deep fully connected neural network, KNN and SVM. Our experiments show that this pre-processing improves the overall generalisation results of the trained deep models but increases the training times of the net. The results from the proposed hybridised approach for feature extraction improvements and a significant promise for future research and development in data mining through deep learning.
  • The Annual Report Algorithm : Retrieval Of Financial Statements And Extraction Of Textual Information
    Jorg Hering, University of Erlangen-Nurnberg, Germany
    U.S. corporations are obligated to file financial statements with the U.S. Securities and Exchange Commission (SEC). The SEC's Electronic Data Gathering, Analysis, and Retrieval (EDGAR) system containing millions of financial statements is one of the most important sources of corporate information available. The paper illustrates which financial statements are publicly available by analyzing the entire SEC EDGAR database since its implementation in 1993. It shows how to retrieve financial statements in a fast and efficient way from EDGAR. The key contribution however is a platform-independent algorithm for business and research purposes designed to extract textual information embedded in financial statements. The dynamic extraction algorithm capable of identifying structural changes within financial statements is applied to more than 180,000 annual reports on Form 10-K filed with the SEC for descriptive statistics and validation purposes
Copyright ® COSIT 2017