Clustering Categorical Data Using the K-Means Algorithm and the Attribute’s Relative Frequency

Clustering is a well known data mining technique used in pattern recognition and information retrieval. The initial dataset to be clustered can either contain categorical or numeric data. Each type of data has its own specific clustering algorithm. In this context, two algorithms are proposed: the k-means for clustering numeric datasets and the k-modes for categorical datasets. The main encountered problem in data mining applications is clustering categorical dataset so relevant in the datasets. One main issue to achieve the clustering process on categorical values is to transform the categorical attributes into numeric measures and directly apply the k-means algorithm instead the k-modes. In this paper, it is proposed to experiment an approach based on the previous issue by transforming the categorical values into numeric ones using the relative frequency of each modality in the attributes. The proposed approach is compared with a previously method based on transforming the categorical datasets into binary values. The scalability and accuracy of the two methods are experimented. The obtained results show that our proposed method outperforms the binary method in all cases.

A Computational Cost-Effective Clustering Algorithm in Multidimensional Space Using the Manhattan Metric: Application to the Global Terrorism Database

The increasing amount of collected data has limited the performance of the current analyzing algorithms. Thus, developing new cost-effective algorithms in terms of complexity, scalability, and accuracy raised significant interests. In this paper, a modified effective k-means based algorithm is developed and experimented. The new algorithm aims to reduce the computational load without significantly affecting the quality of the clusterings. The algorithm uses the City Block distance and a new stop criterion to guarantee the convergence. Conducted experiments on a real data set show its high performance when compared with the original k-means version.

Three Tier Indoor Localization System for Digital Forensics

Mobile localization has attracted a great deal of attention recently due to the introduction of wireless networks. Although several localization algorithms and systems have been implemented and discussed in the literature, very few researchers have exploited the gap that exists between indoor localization, tracking, external storage of location information and outdoor localization for the purpose of digital forensics during and after a disaster. The contribution of this paper lies in the implementation of a robust system that is capable of locating, tracking mobile device users and store location information for both indoor and partially outdoor the cloud. The system can be used during disaster to track and locate mobile phone users. The developed system is a mobile application built based on Android, Hypertext Preprocessor (PHP), Cascading Style Sheets (CSS), JavaScript and MATLAB for the Android mobile users. Using Waterfall model of software development, we have implemented a three level system that is able to track, locate and store mobile device information in secure database (cloud) on almost a real time basis. The outcome of the study showed that the developed system is efficient with regard to the tracking and locating mobile devices. The system is also flexible, i.e. can be used in any building with fewer adjustments. Finally, the system is accurate for both indoor and outdoor in terms of locating and tracking mobile devices.

Causal Relation Identification Using Convolutional Neural Networks and Knowledge Based Features

Causal relation identification is a crucial task in information extraction and knowledge discovery. In this work, we present two approaches to causal relation identification. The first is a classification model trained on a set of knowledge-based features. The second is a deep learning based approach training a model using convolutional neural networks to classify causal relations. We experiment with several different convolutional neural networks (CNN) models based on previous work on relation extraction as well as our own research. Our models are able to identify both explicit and implicit causal relations as well as the direction of the causal relation. The results of our experiments show a higher accuracy than previously achieved for causal relation identification tasks.

Multi-Agent System for Irrigation Using Fuzzy Logic Algorithm and Open Platform Communication Data Access

Automatic irrigation systems usually conveniently protect landscape investment. While conventional irrigation systems are known to be inefficient, automated ones have the potential to optimize water usage. In fact, there is a new generation of irrigation systems that are smart in the sense that they monitor the weather, soil conditions, evaporation and plant water use, and automatically adjust the irrigation schedule. In this paper, we present an agent based smart irrigation system. The agents are built using a mix of commercial off the shelf software, including MATLAB, Microsoft Excel and KEPServer Ex5 OPC server, and custom written code. The Irrigation Scheduler Agent uses fuzzy logic to integrate the information that affect the irrigation schedule. In addition, the Multi-Agent system uses Open Platform Connectivity (OPC) technology to share data. OPC technology enables the Irrigation Scheduler Agent to communicate over the Internet, making the system scalable to a municipal or regional agent based water monitoring, management, and optimization system. Finally, this paper presents simulation and pilot installation test result that show the operational effectiveness of our system.

Influence of Local Soil Conditions on Optimal Load Factors for Seismic Design of Buildings

Optimal load factors (dead, live and seismic) used for the design of buildings may be different, depending of the seismic ground motion characteristics to which they are subjected, which are closely related to the type of soil conditions where the structures are located. The influence of the type of soil on those load factors, is analyzed in the present study. A methodology that is useful for establishing optimal load factors that minimize the cost over the life cycle of the structure is employed; and as a restriction, it is established that the probability of structural failure must be less than or equal to a prescribed value. The life-cycle cost model used here includes different types of costs. The optimization methodology is applied to two groups of reinforced concrete buildings. One set (consisting on 4-, 7-, and 10-story buildings) is located on firm ground (with a dominant period Ts=0.5 s) and the other (consisting on 6-, 12-, and 16-story buildings) on soft soil (Ts=1.5 s) of Mexico City. Each group of buildings is designed using different combinations of load factors. The statistics of the maximums inter-story drifts (associated with the structural capacity) are found by means of incremental dynamic analyses. The buildings located on firm zone are analyzed under the action of 10 strong seismic records, and those on soft zone, under 13 strong ground motions. All the motions correspond to seismic subduction events with magnitudes M=6.9. Then, the structural damage and the expected total costs, corresponding to each group of buildings, are estimated. It is concluded that the optimal load factors combination is different for the design of buildings located on firm ground than that for buildings located on soft soil.

Educational Knowledge Transfer in Indigenous Mexican Areas Using Cloud Computing

This work proposes a Cooperation-Competitive (Coopetitive) approach that allows coordinated work among the Secretary of Public Education (SEP), the Autonomous University of Querétaro (UAQ) and government funds from National Council for Science and Technology (CONACYT) or some other international organizations. To work on an overall knowledge transfer strategy with e-learning over the Cloud, where experts in junior high and high school education, working in multidisciplinary teams, perform analysis, evaluation, design, production, validation and knowledge transfer at large scale using a Cloud Computing platform. Allowing teachers and students to have all the information required to ensure a homologated nationally knowledge of topics such as mathematics, statistics, chemistry, history, ethics, civism, etc. This work will start with a pilot test in Spanish and initially in two regional dialects Otomí and Náhuatl. Otomí has more than 285,000 speaking indigenes in Queretaro and Mexico´s central region. Náhuatl is number one indigenous dialect spoken in Mexico with more than 1,550,000 indigenes. The phase one of the project takes into account negotiations with indigenous tribes from different regions, and the Information and Communication technologies to deliver the knowledge to the indigenous schools in their native dialect. The methodology includes the following main milestones: Identification of the indigenous areas where Otomí and Náhuatl are the spoken dialects, research with the SEP the location of actual indigenous schools, analysis and inventory or current schools conditions, negotiation with tribe chiefs, analysis of the technological communication requirements to reach the indigenous communities, identification and inventory of local teachers technology knowledge, selection of a pilot topic, analysis of actual student competence with traditional education system, identification of local translators, design of the e-learning platform, design of the multimedia resources and storage strategy for “Cloud Computing”, translation of the topic to both dialects, Indigenous teachers training, pilot test, course release, project follow up, analysis of student requirements for the new technological platform, definition of a new and improved proposal with greater reach in topics and regions. Importance of phase one of the project is multiple, it includes the proposal of a working technological scheme, focusing in the cultural impact in Mexico so that indigenous tribes can improve their knowledge about new forms of crop improvement, home storage technologies, proven home remedies for common diseases, ways of preparing foods containing major nutrients, disclose strengths and weaknesses of each region, communicating through cloud computing platforms offering regional products and opening communication spaces for inter-indigenous cultural exchange.

ParkedGuard: An Efficient and Accurate Parked Domain Detection System Using Graphical Locality Analysis and Coarse-To-Fine Strategy

As world wild internet has non-stop developments, making profit by lending registered domain names emerges as a new business in recent years. Unfortunately, the larger the market scale of domain lending service becomes, the riskier that there exist malicious behaviors or malwares hiding behind parked domains will be. Also, previous work for differentiating parked domain suffers two main defects: 1) too much data-collecting effort and CPU latency needed for features engineering and 2) ineffectiveness when detecting parked domains containing external links that are usually abused by hackers, e.g., drive-by download attack. Aiming for alleviating above defects without sacrificing practical usability, this paper proposes ParkedGuard as an efficient and accurate parked domain detector. Several scripting behavioral features were analyzed, while those with special statistical significance are adopted in ParkedGuard to make feature engineering much more cost-efficient. On the other hand, finding memberships between external links and parked domains was modeled as a graph mining problem, and a coarse-to-fine strategy was elaborately designed by leverage the graphical locality such that ParkedGuard outperforms the state-of-the-art in terms of both recall and precision rates.

Non-Convex Multi Objective Economic Dispatch Using Ramp Rate Biogeography Based Optimization

Multi objective non-convex economic dispatch problems of a thermal power plant are of grave concern for deciding the cost of generation and reduction of emission level for diminishing the global warming level for improving green-house effect. This paper deals with ramp rate constraints for achieving better inequality constraints so as to incorporate valve point loading for cost of generation in thermal power plant through ramp rate biogeography based optimization involving mutation and migration. Through 50 out of 100 trials, the cost function and emission objective function were found to have outperformed other classical methods such as lambda iteration method, quadratic programming method and many heuristic methods like particle swarm optimization method, weight improved particle swarm optimization method, constriction factor based particle swarm optimization method, moderate random particle swarm optimization method etc. Ramp rate biogeography based optimization applications prove quite advantageous in solving non convex multi objective economic dispatch problems subjected to nonlinear loads that pollute the source giving rise to third harmonic distortions and other such disturbances.

Ramp Rate and Constriction Factor Based Dual Objective Economic Load Dispatch Using Particle Swarm Optimization

Economic Load Dispatch (ELD) proves to be a vital optimization process in electric power system for allocating generation amongst various units to compute the cost of generation, the cost of emission involving global warming gases like sulphur dioxide, nitrous oxide and carbon monoxide etc. In this dissertation, we emphasize ramp rate constriction factor based particle swarm optimization (RRCPSO) for analyzing various performance objectives, namely cost of generation, cost of emission, and a dual objective function involving both these objectives through the experimental simulated results. A 6-unit 30 bus IEEE test case system has been utilized for simulating the results involving improved weight factor advanced ramp rate limit constraints for optimizing total cost of generation and emission. This method increases the tendency of particles to venture into the solution space to ameliorate their convergence rates. Earlier works through dispersed PSO (DPSO) and constriction factor based PSO (CPSO) give rise to comparatively higher computational time and less good optimal solution at par with current dissertation. This paper deals with ramp rate and constriction factor based well defined ramp rate PSO to compute various objectives namely cost, emission and total objective etc. and compares the result with DPSO and weight improved PSO (WIPSO) techniques illustrating lesser computational time and better optimal solution. 

Performance Analysis of the Time-Based and Periodogram-Based Energy Detector for Spectrum Sensing

Classically, an energy detector is implemented in time domain (TD). However, frequency domain (FD) based energy detector has demonstrated an improved performance. This paper presents a comparison between the two approaches as to analyze their pros and cons. A detailed performance analysis of the classical TD energy-detector and the periodogram based detector is performed. Exact and approximate mathematical expressions for probability of false alarm (Pf) and probability of detection (Pd) are derived for both approaches. The derived expressions naturally lead to an analytical as well as intuitive reasoning for the improved performance of (Pf) and (Pd) in different scenarios. Our analysis suggests the dependence improvement on buffer sizes. Pf is improved in FD, whereas Pd is enhanced in TD based energy detectors. Finally, Monte Carlo simulations results demonstrate the analysis reached by the derived expressions.

Design and Implementation of Medium Access Control Based Routing on Real Wireless Sensor Networks Testbed

IEEE 802.15.4 is a Low Rate Wireless Personal Area Networks (LR-WPAN) standard combined with ZigBee, which is going to enable new applications in Wireless Sensor Networks (WSNs) and Internet of Things (IoT) domain. In recent years, it has become a popular standard for WSNs. Wireless communication among sensor motes, enabled by IEEE 802.15.4 standard, is extensively replacing the existing wired technology in a wide range of monitoring and control applications. Researchers have proposed a routing framework and mechanism that interacts with the IEEE 802.15.4 standard using software platform. In this paper, we have designed and implemented MAC based routing (MBR) based on IEEE 802.15.4 standard using a hardware platform “SENSEnuts”. The experimental results include data through light and temperature sensors obtained from communication between PAN coordinator and source node through coordinator, MAC address of some modules used in the experimental setup, topology of the network created for simulation and the remaining battery power of the source node. Our experimental effort on a WSN Testbed has helped us in bridging the gap between theoretical and practical aspect of implementing IEEE 802.15.4 for WSNs applications.

Seismic Vulnerability of Structures Designed in Accordance with the Allowable Stress Design and Load Resistant Factor Design Methods

The method selected for the design of structures not only can affect their seismic vulnerability but also can affect their construction cost. For the design of steel structures, two distinct methods have been introduced by existing codes, namely allowable stress design (ASD) and load resistant factor design (LRFD). This study investigates the effect of using the aforementioned design methods on the seismic vulnerability and construction cost of steel structures. Specifically, a 20-story building equipped with special moment resisting frame and an eccentrically braced system was selected for this study. The building was designed for three different intensities of peak ground acceleration including 0.2 g, 0.25 g, and 0.3 g using the ASD and LRFD methods. The required sizes of beams, columns, and braces were obtained using response spectrum analysis. Then, the designed frames were subjected to nine natural earthquake records which were scaled to the designed response spectrum. For each frame, the base shear, story shears, and inter-story drifts were calculated and then were compared. Results indicated that the LRFD method led to a more economical design for the frames. In addition, the LRFD method resulted in lower base shears and larger inter-story drifts when compared with the ASD method. It was concluded that the application of the LRFD method not only reduced the weights of structural elements but also provided a higher safety margin against seismic actions when compared with the ASD method.

Design of 900 MHz High Gain SiGe Power Amplifier with Linearity Improved Bias Circuit

A 900 MHz three-stage SiGe power amplifier (PA) with high power gain is presented in this paper. Volterra Series is applied to analyze nonlinearity sources of SiGe HBT device model clearly. Meanwhile, the influence of operating current to IMD3 is discussed. Then a β-helper current mirror bias circuit is applied to improve linearity, since the β-helper current mirror bias circuit can offer stable base biasing voltage. Meanwhile, it can also work as predistortion circuit when biasing voltages of three bias circuits are fine-tuned, by this way, the power gain and operating current of PA are optimized for best linearity. The three power stages which fabricated by 0.18 μm SiGe technology are bonded to the printed circuit board (PCB) to obtain impedances by Load-Pull system, then matching networks are done for best linearity with discrete passive components on PCB. The final measured three-stage PA exhibits 21.1 dBm of output power at 1 dB compression point (OP1dB) with power added efficiency (PAE) of 20.6% and 33 dB power gain under 3.3 V power supply voltage.

Evaluation of Applicability of High Strength Stirrup for Prestressed Concrete Members

Recently, the use of high-strength materials is increasing as the construction of large structures and high-rise structures increases. This paper presents an analysis of the shear behavior of prestressed concrete members with various types of materials by simulating a finite element (FE) analysis. The analytical results indicated that the shear strength and shear failure mode were strongly influenced by not only the shear reinforcement ratio but also the yield strength of shear reinforcement and the compressive strength of concrete. Though the yield strength of shear reinforcement increased the shear strength of prestressed concrete members, there was a limit to the increase in strength because of the change of shear failure modes. According to the results of FE analysis on various parameters, the maximum yield strength of the steel stirrup that can be applied to prestressed concrete members was about 860 MPa.

Chilean Business Orientalism: The Role of Non-State Actors in the Frame of Asymmetric Bilateral Relations

The current research paper assesses how the narrative of Chilean businesspeople about China shapes a new Orientalism Analyses on the role of non-state actors in foreign policy that have hitherto theorized about Orientalism as a narrative of hegemonic power. Hence, it has been instrumental to the efforts of imperialist powers to justify their mission civilisatrice. However, such conceptualization can seldom explain new complexities of international interactions at the height of globalization. Hence, we assessed the case of Chile, a small Latin American country, and its relationship with China, its largest trading partner. Through a discourse analysis of interviews with Chilean businesspeople engaged in the Chinese market, we could determine that Chile is building an Orientalist image of China. This new business Orientalism reinforces a relation of alterity based on commercial opportunities, traditional values, and natural dispositions. Hence, the perception of the Chinese Other amongst Chilean business people frames a new set of representations as part of the essentially commercial nature of current bilateral relations. It differs from previous frames, such as the racial bias frame of the early 20th century, or the anti-communist frame in reaction to Mao’s leadership. As in every narrative of alterity, there is not only a construction of the Other but also a definition of the Self. Consequently, this analysis constitutes a relevant case of the role of non-state actors in asymmetrical bilateral relations, where the non-state actors of the minor power build and act upon an Orientalist frame, which is not representative of its national status in the relation. This study emerges as a contribution on the relation amongst non-state actors in asymmetrical relations, where the smaller power’s business class acts on a negative prejudice of its interactions with its counterpart. The research builds upon the constructivist approach to international relations, linking the idea of Nation Branding with Orientalism in the case of Chile-China relations.

Effects of the In-Situ Upgrading Project in Afghanistan: A Case Study on the Formally and Informally Developed Areas in Kabul

Cities in Afghanistan have been rapidly urbanized; however, many parts of these cities have been developed with no detailed land use plan or infrastructure. In other words, they have been informally developed without any government leadership. The new government started the In-situ Upgrading Project in Kabul to upgrade roads, the water supply network system, and the surface water drainage system on the existing street layout in 2002, with the financial support of international agencies. This project is an appropriate emergency improvement for living life, but not an essential improvement of living conditions and infrastructure problems because the life expectancies of the improved facilities are as short as 10–15 years, and residents cannot obtain land tenure in the unplanned areas. The Land Readjustment System (LRS) conducted in Japan has good advantages that rearrange irregularly shaped land lots and develop the infrastructure effectively. This study investigates the effects of the In-situ Upgrading Project on private investment, land prices, and residents’ satisfaction with projects in Kart-e-Char, where properties are registered, and in Afshar-e-Silo Lot 1, where properties are unregistered. These projects are located 5 km and 7 km from the CBD area of Kabul, respectively. This study discusses whether LRS should be applied to the unplanned area based on the questionnaire and interview responses of experts experienced in the In-situ Upgrading Project who have knowledge of LRS. The analysis results reveal that, in Kart-e-Char, a lot of private investment has been made in the construction of medium-rise (five- to nine-story) buildings for commercial and residential purposes. Land values have also incrementally increased since the project, and residents are commonly satisfied with the road pavement, drainage systems, and water supplies, but dissatisfied with the poor delivery of electricity as well as the lack of public facilities (e.g., parks and sport facilities). In Afshar-e-Silo Lot 1, basic infrastructures like paved roads and surface water drainage systems have improved from the project. After the project, a few four- and five-story residential buildings were built with very low-level private investments, but significant increases in land prices were not evident. The residents are satisfied with the contribution ratio, drainage system, and small increase in land price, but there is still no drinking water supply system or tenure security; moreover, there are substandard paved roads and a lack of public facilities, such as parks, sport facilities, mosques, and schools. The results of the questionnaire and interviews with the four engineers highlight the problems that remain to be solved in the unplanned areas if LRS is applied—namely, land use differences, types and conditions of the infrastructure still to be installed by the project, and time spent for positive consensus building among the residents, given the project’s budget limitation.

Moving Object Detection Using Histogram of Uniformly Oriented Gradient

Moving object detection (MOD) is an important issue in advanced driver assistance systems (ADAS). There are two important moving objects, pedestrians and scooters in ADAS. In real-world systems, there exist two important challenges for MOD, including the computational complexity and the detection accuracy. The histogram of oriented gradient (HOG) features can easily detect the edge of object without invariance to changes in illumination and shadowing. However, to reduce the execution time for real-time systems, the image size should be down sampled which would lead the outlier influence to increase. For this reason, we propose the histogram of uniformly-oriented gradient (HUG) features to get better accurate description of the contour of human body. In the testing phase, the support vector machine (SVM) with linear kernel function is involved. Experimental results show the correctness and effectiveness of the proposed method. With SVM classifiers, the real testing results show the proposed HUG features achieve better than classification performance than the HOG ones.

Summarizing Data Sets for Data Mining by Using Statistical Methods in Coastal Engineering

Coastal regions are the one of the most commonly used places by the natural balance and the growing population. In coastal engineering, the most valuable data is wave behaviors. The amount of this data becomes very big because of observations that take place for periods of hours, days and months. In this study, some statistical methods such as the wave spectrum analysis methods and the standard statistical methods have been used. The goal of this study is the discovery profiles of the different coast areas by using these statistical methods, and thus, obtaining an instance based data set from the big data to analysis by using data mining algorithms. In the experimental studies, the six sample data sets about the wave behaviors obtained by 20 minutes of observations from Mersin Bay in Turkey and converted to an instance based form, while different clustering techniques in data mining algorithms were used to discover similar coastal places. Moreover, this study discusses that this summarization approach can be used in other branches collecting big data such as medicine.

Impact of Long Term Application of Municipal Solid Waste on Physicochemical and Microbial Parameters and Heavy Metal Distribution in Soils in Accordance to Its Agricultural Uses

Municipal Solid Waste (MSW), being a rich source of organic materials, can be used for agricultural applications as an important source of nutrients for soil and plants. This is also an alternative beneficial management practice for MSW generated in developing countries. In the present study, MSW treated soil samples from last four to six years at farmer’s field in Rohtak and Gurgaon states (Haryana, India) were collected. The samples were analyzed for all-important agricultural parameters and compared with the control untreated soil samples. The treated soil at farmer’s field showed increase in total N by 48 to 68%, P by 45.7 to 51.3%, and K by 60 to 67% compared to untreated soil samples. Application of sewage sludge at different sites led to increase in microbial biomass C by 60 to 68% compared to untreated soil. There was significant increase in total Cu, Cr, Ni, Fe, Pb, and Zn in all sewage sludge amended soil samples; however, concentration of all the metals were still below the current permitted (EU) limits. To study the adverse effect of heavy metals accumulation on various soil microbial activities, the sewage sludge samples (from wastewater treatment plant at Gurgaon) were artificially contaminated with heavy metal concentration above the EU limits. They were then applied to soil samples with different rates (0.5 to 4.0%) and incubated for 90 days under laboratory conditions. The samples were drawn at different intervals and analyzed for various parameters like pH, EC, total N, P, K, microbial biomass C, carbon mineralization, and diethylenetriaminepentaacetic acid (DTPA) exactable heavy metals. The results were compared to the uncontaminated sewage sludge. The increasing level of sewage sludge from 0.5 to 4% led to build of organic C and total N, P and K content at the early stages of incubation. But, organic C was decreased after 90 days because of decomposition of organic matter. Biomass production was significantly increased in both contaminated and uncontaminated sewage soil samples, but also led to slight increases in metal accumulation and their bioavailability in soil. The maximum metal concentrations were found in treatment with 4% of contaminated sewage sludge amendment.