Database Compression for Intelligent On-board Vehicle Controllers

The vehicle fleet of public transportation companies is often equipped with intelligent on-board passenger information systems. A frequently used but time and labor-intensive way for keeping the on-board controllers up-to-date is the manual update using different memory cards (e.g. flash cards) or portable computers. This paper describes a compression algorithm that enables data transmission using low bandwidth wireless radio networks (e.g. GPRS) by minimizing the amount of data traffic. In typical cases it reaches a compression rate of an order of magnitude better than that of the general purpose compressors. Compressed data can be easily expanded by the low-performance controllers, too.

Tracking Activity of Real Individuals in Web Logs

This paper describes an enhanced cookie-based method for counting the visitors of web sites by using a web log processing system that aims to cope with the ambitious goal of creating countrywide statistics about the browsing practices of real human individuals. The focus is put on describing a new more efficient way of detecting human beings behind web users by placing different identifiers on the client computers. We briefly introduce our processing system designed to handle the massive amount of data records continuously gathered from the most important content providers of the Hungary. We conclude by showing statistics of different time spans comparing the efficiency of multiple visitor counting methods to the one presented here, and some interesting charts about content providers and web usage based on real data recorded in 2007 will also be presented.

Concurrency without Locking in Parallel Hash Structures used for Data Processing

Various mechanisms providing mutual exclusion and thread synchronization can be used to support parallel processing within a single computer. Instead of using locks, semaphores, barriers or other traditional approaches in this paper we focus on alternative ways for making better use of modern multithreaded architectures and preparing hash tables for concurrent accesses. Hash structures will be used to demonstrate and compare two entirely different approaches (rule based cooperation and hardware synchronization support) to an efficient parallel implementation using traditional locks. Comparison includes implementation details, performance ranking and scalability issues. We aim at understanding the effects the parallelization schemes have on the execution environment with special focus on the memory system and memory access characteristics.

Automatic Rearrangement of Localized Graphical User Interface

The localization of software products is essential for reaching the users of the international market. An important task for this is the translation of the user interface into local national languages. As graphical interfaces are usually optimized for the size of the texts in the original language, after the translation certain user controls (e.g. text labels and buttons in dialogs) may grow in such a manner that they slip above each other. This not only causes an unpleasant appearance but also makes the use of the program more difficult (or even impossible) which implies that the arrangement of the controls must be corrected subsequently. The correction should preserve the original structure of the interface (e.g. the relation of logically coherent controls), furthermore, it is important to keep the nicely proportioned design: the formation of large empty areas should be avoided. This paper describes an algorithm that automatically rearranges the controls of a graphical user interface based on the principles above. The algorithm has been implemented and integrated into a translation support system and reached results pleasant for the human eye in most test cases.

Analysis of Web User Identification Methods

Web usage mining has become a popular research area, as a huge amount of data is available online. These data can be used for several purposes, such as web personalization, web structure enhancement, web navigation prediction etc. However, the raw log files are not directly usable; they have to be preprocessed in order to transform them into a suitable format for different data mining tasks. One of the key issues in the preprocessing phase is to identify web users. Identifying users based on web log files is not a straightforward problem, thus various methods have been developed. There are several difficulties that have to be overcome, such as client side caching, changing and shared IP addresses and so on. This paper presents three different methods for identifying web users. Two of them are the most commonly used methods in web log mining systems, whereas the third on is our novel approach that uses a complex cookie-based method to identify web users. Furthermore we also take steps towards identifying the individuals behind the impersonal web users. To demonstrate the efficiency of the new method we developed an implementation called Web Activity Tracking (WAT) system that aims at a more precise distinction of web users based on log data. We present some statistical analysis created by the WAT on real data about the behavior of the Hungarian web users and a comprehensive analysis and comparison of the three methods