Arabic Word Semantic Similarity

This paper is concerned with the production of an Arabic word semantic similarity benchmark dataset. It is the first of its kind for Arabic which was particularly developed to assess the accuracy of word semantic similarity measurements. Semantic similarity is an essential component to numerous applications in fields such as natural language processing, artificial intelligence, linguistics, and psychology. Most of the reported work has been done for English. To the best of our knowledge, there is no word similarity measure developed specifically for Arabic. In this paper, an Arabic benchmark dataset of 70 word pairs is presented. New methods and best possible available techniques have been used in this study to produce the Arabic dataset. This includes selecting and creating materials, collecting human ratings from a representative sample of participants, and calculating the overall ratings. This dataset will make a substantial contribution to future work in the field of Arabic WSS and hopefully it will be considered as a reference basis from which to evaluate and compare different methodologies in the field.

Functioning of Turkic Elements in Modern Hindi

It is discussed about modern usage of adopted words and their vocabularies, Turkism usage fields, phonetic, grammatical and lexis-semantic assimilation of the typological-morphological structures of entering to different Hindi languages in comparative typological aspects in this scientific article. The lexis vocabulary is rich, the prevalence area is wide and it has researched the entering process of vocabulary into the great languages of Turkic elements from the speakers- numbers. The research work has worked on the base of Hindi vocabulary.

The Framework for Adaptive Games for Mobile Application Using Neural Networks

The rapid development of the BlackBerry games industry and its development goals were not just for entertainment, but also used for educational of students interactively. Unfortunately the development of adaptive educational games on BlackBerry in Indonesian language that interesting and entertaining for learning process is very limited. This paper shows the research of development of novel adaptive educational games for students who can adjust the difficulty level of games based on the ability of the user, so that it can motivate students to continue to play these games. We propose a method where these games can adjust the level of difficulty, based on the assessment of the results of previous problems using neural networks with three inputs in the form of percentage correct, the speed of answer and interest mode of games (animation / lessons) and 1 output. The experimental results are presented and show the adaptive games are running well on mobile devices based on BlackBerry platform

Assamese Numeral Speech Recognition using Multiple Features and Cooperative LVQ -Architectures

A set of Artificial Neural Network (ANN) based methods for the design of an effective system of speech recognition of numerals of Assamese language captured under varied recording conditions and moods is presented here. The work is related to the formulation of several ANN models configured to use Linear Predictive Code (LPC), Principal Component Analysis (PCA) and other features to tackle mood and gender variations uttering numbers as part of an Automatic Speech Recognition (ASR) system in Assamese. The ANN models are designed using a combination of Self Organizing Map (SOM) and Multi Layer Perceptron (MLP) constituting a Learning Vector Quantization (LVQ) block trained in a cooperative environment to handle male and female speech samples of numerals of Assamese- a language spoken by a sizable population in the North-Eastern part of India. The work provides a comparative evaluation of several such combinations while subjected to handle speech samples with gender based differences captured by a microphone in four different conditions viz. noiseless, noise mixed, stressed and stress-free.

A Study on Fantasy Images Represented on the Films: Focused on Mise-en-Scène Element

The genre of fantasy depicts a world of imagine that triggers popular interest from a created view of world, and a fantasy is defined as a story that illustrates a world of imagine where scientific or horror elements are stand in its center. This study is not focused on the narrative of the fantasy, i.e. not on the adventurous story, but is concentrated on the image of the fantasy to work on its relationship with intended themes and differences among cultures due to meanings of materials. As for films, we have selected some films in the 2000's that are internationally recognized as expressing unique images of fantasy containing the theme of love in them. The selected films are 5 pieces including two European films, Amelie from Montmartre (2001) and The Science of Sleep (2005) and three Asian films, Citizen Dog from Thailand (2004), Memories of Matsuko from Japan (2006), and I'm a Cyborg, but That's OK from Korea (2006). These films share some common characteristics to the effect that they give tiny lessons and feelings for life with expressions of fantasy images as if they were fairy tales for adults and that they lead the audience to reflect on their days and revive forgotten dreams of childhood. We analyze the images of fantasy in each of the films on the basis of the elements of Mise-en-Scène (setting and props, costume, hair and make-up, facial expressions and body language, lighting and color, positioning of characters, and objects within a frame).

Data and Control Flow Analysis of VDMµ Specifications

Formal Specification languages are being widely used for system specification and testing. Highly critical systems such as real time systems, avionics, and medical systems are represented using Formal specification languages. Formal specifications based testing is mostly performed using black box testing approaches thus testing only the set of inputs and outputs of the system. The formal specification language such as VDMµ can be used for white box testing as they provide enough constructs as any other high level programming language. In this work, we perform data and control flow analysis of VDMµ class specifications. The proposed work is discussed with an example of SavingAccount.

A Study on the Secure ebXML Transaction Models

ebXML (Electronic Business using eXtensible Markup Language) is an e-business standard, sponsored by UN/CEFACT and OASIS, which enables enterprises to exchange business messages, conduct trading relationships, communicate data in common terms and define and register business processes. While there is tremendous e-business value in the ebXML, security remains an unsolved problem and one of the largest barriers to adoption. XML security technologies emerging recently have extensibility and flexibility suitable for security implementation such as encryption, digital signature, access control and authentication. In this paper, we propose ebXML business transaction models that allow trading partners to securely exchange XML based business transactions by employing XML security technologies. We show how each XML security technology meets the ebXML standard by constructing the test software and validating messages between the trading partners.

Query Algebra for Semistuctured Data

With the tremendous growth of World Wide Web (WWW) data, there is an emerging need for effective information retrieval at the document level. Several query languages such as XML-QL, XPath, XQL, Quilt and XQuery are proposed in recent years to provide faster way of querying XML data, but they still lack of generality and efficiency. Our approach towards evolving a framework for querying semistructured documents is based on formal query algebra. Two elements are introduced in the proposed framework: first, a generic and flexible data model for logical representation of semistructured data and second, a set of operators for the manipulation of objects defined in the data model. In additional to accommodating several peculiarities of semistructured data, our model offers novel features such as bidirectional paths for navigational querying and partitions for data transformation that are not available in other proposals.

SMaTTS: Standard Malay Text to Speech System

This paper presents a rule-based text- to- speech (TTS) Synthesis System for Standard Malay, namely SMaTTS. The proposed system using sinusoidal method and some pre- recorded wave files in generating speech for the system. The use of phone database significantly decreases the amount of computer memory space used, thus making the system very light and embeddable. The overall system was comprised of two phases the Natural Language Processing (NLP) that consisted of the high-level processing of text analysis, phonetic analysis, text normalization and morphophonemic module. The module was designed specially for SM to overcome few problems in defining the rules for SM orthography system before it can be passed to the DSP module. The second phase is the Digital Signal Processing (DSP) which operated on the low-level process of the speech waveform generation. A developed an intelligible and adequately natural sounding formant-based speech synthesis system with a light and user-friendly Graphical User Interface (GUI) is introduced. A Standard Malay Language (SM) phoneme set and an inclusive set of phone database have been constructed carefully for this phone-based speech synthesizer. By applying the generative phonology, a comprehensive letter-to-sound (LTS) rules and a pronunciation lexicon have been invented for SMaTTS. As for the evaluation tests, a set of Diagnostic Rhyme Test (DRT) word list was compiled and several experiments have been performed to evaluate the quality of the synthesized speech by analyzing the Mean Opinion Score (MOS) obtained. The overall performance of the system as well as the room for improvements was thoroughly discussed.

Distributional Semantics Approach to Thai Word Sense Disambiguation

Word sense disambiguation is one of the most important open problems in natural language processing applications such as information retrieval and machine translation. Many approach strategies can be employed to resolve word ambiguity with a reasonable degree of accuracy. These strategies are: knowledgebased, corpus-based, and hybrid-based. This paper pays attention to the corpus-based strategy that employs an unsupervised learning method for disambiguation. We report our investigation of Latent Semantic Indexing (LSI), an information retrieval technique and unsupervised learning, to the task of Thai noun and verbal word sense disambiguation. The Latent Semantic Indexing has been shown to be efficient and effective for Information Retrieval. For the purposes of this research, we report experiments on two Thai polysemous words, namely  /hua4/ and /kep1/ that are used as a representative of Thai nouns and verbs respectively. The results of these experiments demonstrate the effectiveness and indicate the potential of applying vector-based distributional information measures to semantic disambiguation.

An Investigation on the Variation of Software Development Productivity

The productivity of software development is one of the major concerns for project managers. Given the increasing complexity of the software being developed and the concomitant rise in the typical project size, the productivity has not consistently improved. By analyzing the latest release of ISBSG data repository with 4106 projects ever developed, we report on the factors found to significantly influence productivity, and present an original model for the estimation of productivity during project design. We further illustrate that software development productivity has experienced irregular variations between the years 1995 and 2005. Considering the factors significant to productivity, we found its variations are primarily caused by the variations of average team size for the development and the unbalanced use of the less productive development language 3GL.

A Multilanguage Source Code Retrieval System Using Structural-Semantic Fingerprints

Source code retrieval is of immense importance in the software engineering field. The complex tasks of retrieving and extracting information from source code documents is vital in the development cycle of the large software systems. The two main subtasks which result from these activities are code duplication prevention and plagiarism detection. In this paper, we propose a Mohamed Amine Ouddan, and Hassane Essafi source code retrieval system based on two-level fingerprint representation, respectively the structural and the semantic information within a source code. A sequence alignment technique is applied on these fingerprints in order to quantify the similarity between source code portions. The specific purpose of the system is to detect plagiarism and duplicated code between programs written in different programming languages belonging to the same class, such as C, Cµ, Java and CSharp. These four languages are supported by the actual version of the system which is designed such that it may be easily adapted for any programming language.

Energy Conscious Builder Design Pattern with C# and Intermediate Language

Design Patterns have gained more and more acceptances since their emerging in software development world last decade and become another de facto standard of essential knowledge for Object-Oriented Programming developers nowadays. Their target usage, from the beginning, was for regular computers, so, minimizing power consumption had never been a concern. However, in this decade, demands of more complicated software for running on mobile devices has grown rapidly as the much higher performance portable gadgets have been supplied to the market continuously. To get along with time to market that is business reason, the section of software development for power conscious, battery, devices has shifted itself from using specific low-level languages to higher level ones. Currently, complicated software running on mobile devices are often developed by high level languages those support OOP concepts. These cause the trend of embracing Design Patterns to mobile world. However, using Design Patterns directly in software development for power conscious systems is not recommended because they were not originally designed for such environment. This paper demonstrates the adapted Design Pattern for power limitation system. Because there are numerous original design patterns, it is not possible to mention the whole at once. So, this paper focuses only in creating Energy Conscious version of existing regular "Builder Pattern" to be appropriated for developing low power consumption software.

Using the PGAS Programming Paradigm for Biological Sequence Alignment on a Chip Multi-Threading Architecture

The Partitioned Global Address Space (PGAS) programming paradigm offers ease-of-use in expressing parallelism through a global shared address space while emphasizing performance by providing locality awareness through the partitioning of this address space. Therefore, the interest in PGAS programming languages is growing and many new languages have emerged and are becoming ubiquitously available on nearly all modern parallel architectures. Recently, new parallel machines with multiple cores are designed for targeting high performance applications. Most of the efforts have gone into benchmarking but there are a few examples of real high performance applications running on multicore machines. In this paper, we present and evaluate a parallelization technique for implementing a local DNA sequence alignment algorithm using a PGAS based language, UPC (Unified Parallel C) on a chip multithreading architecture, the UltraSPARC T1.

Analysis and Prototyping of Biological Systems: the Abstract Biological Process Model

The aim of a biological model is to understand the integrated structure and behavior of complex biological systems as a function of the underlying molecular networks to achieve simulation and forecast of their operation. Although several approaches have been introduced to take into account structural and environment related features, relatively little attention has been given to represent the behavior of biological systems. The Abstract Biological Process (ABP) model illustrated in this paper is an object-oriented model based on UML (the standard object-oriented language). Its main objective is to bring into focus the functional aspects of the biological system under analysis.

Object-Oriented Simulation of Simulating Anticipatory Systems

The present paper is oriented to problems of simulation of anticipatory systems, namely those that use simulation models for the aid of anticipation. A certain analogy between use of simulation and imagining will be applied to make the explication more comprehensible. The paper will be completed by notes of problems and by some existing applications. The problems consist in the fact that simulation of the mentioned anticipatory systems end is simulation of simulating systems, i.e. in computer models handling two or more modeled time axes that should be mapped to real time flow in a nondescent manner. Languages oriented to objects, processes and blocks can be used to surmount the problems.

Some Computational Results on MPI Parallel Implementation of Dense Simplex Method

There are two major variants of the Simplex Algorithm: the revised method and the standard, or tableau method. Today, all serious implementations are based on the revised method because it is more efficient for sparse linear programming problems. Moreover, there are a number of applications that lead to dense linear problems so our aim in this paper is to present some computational results on parallel implementation of dense Simplex Method. Our implementation is implemented on a SMP cluster using C programming language and the Message Passing Interface MPI. Preliminary computational results on randomly generated dense linear programs support our results.

Do C-Test and Cloze Procedure Measure what they Purport to be Measuring? A Case of Criterion-Related Validity

This article investigated the validity of C-test and Cloze test which purport to measure general English proficiency. To provide empirical evidence pertaining to the validity of the interpretations based on the results of these integrative language tests, their criterion-related validity was investigated. In doing so, the test of English as a foreign language (TOEFL) which is an established, standardized, and internationally administered test of general English proficiency was used as the criterion measure. Some 90 Iranian English majors participated in this study. They were seniors studying English at a university in Tehran, Iran. The results of analyses showed that there is a statistically significant correlation among participants- scores on Cloze test, C-test, and the TOEFL. Building on the findings of the study and considering criterion-related validity as the evidential basis of the validity argument, it was cautiously deducted that these tests measure the same underlying trait. However, considering the limitations of using criterion measures to validate tests, no absolute claims can be made as to the construct validity of these integrative tests.

Author's Approach to the Problem of Correctional Speech Therapy with Children Suffering from Alalia

In this article we present a methodology which enables preschool and primary school unlanguaged children to remember words, phrases and texts with the help of graphic signs - letters, syllables and words. Reading for a child becomes a support for speech development. Teaching is based on the principle "from simple to complex", "a letter - a syllable - a word - a proposal - a text." Availability of multi-level texts allows using this methodology for working with children who have different levels of speech development.

Design of Domain-Specific Software Systems with Parametric Code Templates

Domain-specific languages describe specific solutions to problems in the application domain. Traditionally they form a solution composing black-box abstractions together. This, usually, involves non-deep transformations over the target model. In this paper we argue that it is potentially powerful to operate with grey-box abstractions to build a domain-specific software system. We present parametric code templates as grey-box abstractions and conceptual tools to encapsulate and manipulate these templates. Manipulations introduce template-s merging routines and can be defined in a generic way. This involves reasoning mechanisms at the code templates level. We introduce the concept of Neurath Modelling Language (NML) that operates with parametric code templates and specifies a visualisation mapping mechanism for target models. Finally we provide an example of calculating a domain-specific software system with predefined NML elements.