Weighted-Distance Sliding Windows and Cooccurrence Graphs for Supporting Entity-Relationship Discovery in Unstructured Text

The problem of Entity relation discovery in structured data, a well covered topic in literature, consists in searching within unstructured sources (typically, text) in order to find connections among entities. These can be a whole dictionary, or a specific collection of named items. In many cases machine learning and/or text mining techniques are used for this goal. These approaches might be unfeasible in computationally challenging problems, such as processing massive data streams. A faster approach consists in collecting the cooccurrences of any two words (entities) in order to create a graph of relations - a cooccurrence graph. Indeed each cooccurrence highlights some grade of semantic correlation between the words because it is more common to have related words close each other than having them in the opposite sides of the text. Some authors have used sliding windows for such problem: they count all the occurrences within a sliding windows running over the whole text. In this paper we generalise such technique, coming up to a Weighted-Distance Sliding Window, where each occurrence of two named items within the window is accounted with a weight depending on the distance between items: a closer distance implies a stronger evidence of a relationship. We develop an experiment in order to support this intuition, by applying this technique to a data set consisting in the text of the Bible, split into verses.

Case Study on Innovative Aquatic-Based Bioeconomy for Chlorella sorokiniana

Over the last decade due to climate change and a strategy of natural resources preservation, the interest for the aquatic biomass has dramatically increased. Along with mitigation of the environmental pressure and connection of waste streams (including CO2 and heat emissions), microalgae bioeconomy can supply food, feed, as well as the pharmaceutical and power industry with number of value-added products. Furthermore, in comparison to conventional biomass, microalgae can be cultivated in wide range of conditions without compromising food and feed production, thus addressing issues associated with negative social and the environmental impacts. This paper presents the state-of-the art technology for microalgae bioeconomy from cultivation process to production of valuable components and by-streams. Microalgae Chlorella sorokiniana were cultivated in the pilot-scale innovation concept in Hamburg (Germany) using different systems such as race way pond (5000 L) and flat panel reactors (8 x 180 L). In order to achieve the optimum growth conditions along with suitable cellular composition for the further extraction of the value-added components, process parameters such as light intensity, temperature and pH are continuously being monitored. On the other hand, metabolic needs in nutrients were provided by addition of micro- and macro-nutrients into a medium to ensure autotrophic growth conditions of microalgae. The cultivation was further followed by downstream process and extraction of lipids, proteins and saccharides. Lipids extraction is conducted in repeated-batch semi-automatic mode using hot extraction method according to Randall. As solvents hexane and ethanol are used at different ratio of 9:1 and 1:9, respectively. Depending on cell disruption method along with solvents ratio, the total lipids content showed significant variations between 8.1% and 13.9 %. The highest percentage of extracted biomass was reached with a sample pretreated with microwave digestion using 90% of hexane and 10% of ethanol as solvents. Proteins content in microalgae was determined by two different methods, namely: Total Kejadahl Nitrogen (TKN), which further was converted to protein content, as well as Bradford method using Brilliant Blue G-250 dye. Obtained results, showed a good correlation between both methods with protein content being in the range of 39.8–47.1%. Characterization of neutral and acid saccharides from microalgae was conducted by phenol-sulfuric acid method at two wavelengths of 480 nm and 490 nm. The average concentration of neutral and acid saccharides under the optimal cultivation conditions was 19.5% and 26.1%, respectively. Subsequently, biomass residues are used as substrate for anaerobic digestion on the laboratory-scale. The methane concentration, which was measured on the daily bases, showed some variations for different samples after extraction steps but was in the range between 48% and 55%. CO2 which is formed during the fermentation process and after the combustion in the Combined Heat and Power unit can potentially be used within the cultivation process as a carbon source for the photoautotrophic synthesis of biomass.

Water Quality Determination of River Systems in Antalya Basin by Biomonitoring

For evaluation of water quality of the river systems in Antalya Basin, macrozoobenthos samples were taken from 22 determined stations by a hand net and identified at family level. Water quality of Antalya Basin was determined according to Biological Monitoring Working Party (BMWP) system, by using macrozoobenthic invertebrates and physicochemical parameters. As a result of the evaluation, while Aksu Stream was determined as the most polluted stream in Antalya Basin, Isparta Stream was determined as the most polluted tributary of Aksu Stream. Pollution level of the Isparta Stream was determined as quality class V and it is the extremely polluted part of stream. Pollution loads at the sources of the streams were determined in low levels in general. Due to some parts of the streams have passed through deep canyons and take their sources from nonresidential and non-arable regions, majority of the streams that take place in Antalya Basin are at high quality level. Waste water, which comes from agricultural and residential regions, affects the lower basins of the streams. Because of the waste water, lower parts of the stream basins exposed to the pollution under anthropogenic effects. However, in Aksu Stream, which differs by being exposed to domestic and industrial wastes of Isparta City, extreme pollution was determined, particularly in the Isparta Stream part.

Assessment of Wastewater Reuse Potential for an Enamel Coating Industry

In order to eliminate water scarcity problems, effective precautions must be taken. Growing competition for water is increasingly forcing facilities to tackle their own water scarcity problems. At this point, application of wastewater reclamation and reuse results in considerable economic advantageous. In this study, an enamel coating facility, which is one of the high water consumed facilities, is evaluated in terms of its wastewater reuse potential. Wastewater reclamation and reuse can be defined as one of the best available techniques for this sector. Hence, process and pollution profiles together with detailed characterization of segregated wastewater sources are appraised in a way to find out the recoverable effluent streams arising from enamel coating operations. Daily, 170 m3 of process water is required and 160 m3 of wastewater is generated. The segregated streams generated by two enamel coating processes are characterized in terms of conventional parameters. Relatively clean segregated wastewater streams (reusable wastewaters) are separately collected and experimental treatability studies are conducted on it. The results reflected that the reusable wastewater fraction has an approximate amount of 110 m3/day that accounts for 68% of the total wastewaters. The need for treatment applicable on reusable wastewaters is determined by considering water quality requirements of various operations and characterization of reusable wastewater streams. Ultra-filtration (UF), Nano-filtration (NF) and Reverse Osmosis (RO) membranes are subsequently applied on reusable effluent fraction. Adequate organic matter removal is not obtained with the mentioned treatment sequence.

Mining Big Data in Telecommunications Industry: Challenges, Techniques, and Revenue Opportunity

Mining big data represents a big challenge nowadays. Many types of research are concerned with mining massive amounts of data and big data streams. Mining big data faces a lot of challenges including scalability, speed, heterogeneity, accuracy, provenance and privacy. In telecommunication industry, mining big data is like a mining for gold; it represents a big opportunity and maximizing the revenue streams in this industry. This paper discusses the characteristics of big data (volume, variety, velocity and veracity), data mining techniques and tools for handling very large data sets, mining big data in telecommunication and the benefits and opportunities gained from them.

Thermo-Physical Properties and Solubility of CO2 in Piperazine Activated Aqueous Solutions of β-Alanine

Carbon dioxide is one of the major greenhouse gas (GHG) contributors. It is an obligation of the industry to reduce the amount of carbon dioxide emission to the acceptable limits. Tremendous research and studies are reported in the past and still the quest to find the suitable and economical solution of this problem needed to be explored in order to develop the most plausible absorber for carbon dioxide removal. Amino acids can be potential alternate solvents for carbon dioxide capture from gaseous streams. This is due to its ability to resist oxidative degradation, low volatility and its ionic structure. In addition, the introduction of promoter-like piperazine to amino acid helps to further enhance the solubility. In this work, the effect of piperazine on thermo physical properties and solubility of β-Alanine aqueous solutions were studied for various concentrations. The measured physicochemical properties data was correlated as a function of temperature using least-squares method and the correlation parameters are reported together with it respective standard deviations. The effect of activator piperazine on the CO2 loading performance of selected amino acid under high-pressure conditions (1bar to 10bar) at temperature range of (30 to 60)oC was also studied. Solubility of CO2 decreases with increasing temperature and increases with increasing pressure. Quadratic representation of solubility using Response Surface Methodology (RSM) shows that the most important parameter to optimize solubility is system pressure. The addition of promoter increases the solubility effect of the solvent.

Diversity and Structure of Trichoptera Communities and Water Quality Variables in Streams, Northern Thailand

The influence of physicochemical water quality parameters on the abundance and diversity of caddisfly larvae was studied in seven sampling stations in Mae Tao and Mae Ku watersheds, Mae Sot District, Tak Province, northern Thailand. The streams: MK2 and MK8 as reference site, and impacted streams (MT1-MT5) were sampled bi-monthly during July 2011 to May 2012. A total of 4,584 individual of caddisfly larvae belonging to 10 family and 17 genera were found. The larvae of family Hydropsychidae were the most abundance, followed by Philopotamidae, Odontoceridae, and Leptoceridae, respectively. The genus Cheumatopsyche, Hydropsyche, and Chimarra were the most abundance genera in this study. Results of CCA ordination showed the total dissolved solids, sulfate, water temperature, dissolved oxygen and pH were the most important physicochemical factors to affect distribution of caddisflies communities. Changes in the caddisfly fauna may indicate changes in physicochemical factors owing to agricultural pollution, urbanization, or other human activities. Results revealed that the order Trichoptera, identified to species or genus, can be potentially used to assess environmental water quality status in freshwater ecosystems.

Accumulation of Pollutants, Self-purification and Impact on Peripheral Urban Areas: A Case Study in Shantytowns in Argentina

This work sets out to debate the tensions involved in the processes of contamination and self-purification in the urban space, particularly in the streams that run through the Buenos Aires metropolitan area. For much of their course, those streams are piped; their waters do not come into contact with the outdoors until they have reached deeply impoverished urban areas with high levels of environmental contamination. These are peripheral zones that, until thirty years ago, were marshlands and fields. They are now densely populated areas largely lacking in urban infrastructure. The Cárcova neighborhood, where this project is underway, is in the José León Suárez section of General San Martín county, Buenos Aires province. A stretch of José León Suarez canal crosses the neighborhood. Starting upstream, this canal carries pollutants due to the sewage and industrial waste released into it. Further downstream, in the neighborhood, domestic drainage is poured into the stream. In this paper, we formulate a hypothesis diametrical to the one that holds that these neighborhoods are the primary source of contamination, suggesting instead that in the stretch of the canal that runs through the neighborhood the stream’s waters are actually cleaned and the sediments accumulate pollutants. Indeed, the stretches of water that runs through these neighborhoods act as water processing plants for the metropolis. This project has studied the different organic-load polluting contributions to the water in a certain stretch of the canal, the reduction of that load over the course of the canal, and the incorporation of pollutants into the sediments. We have found that the surface water has considerable ability to self-purify, mostly due to processes of sedimentation and adsorption. The polluting load is accumulated in the sediments where that load stabilizes slowly by means of anaerobic processes. In this study, we also investigated the risks of sediment management and the use of the processes studied here in controlled conditions as tools of environmental restoration.

Gammarus:Asellus Ratio as an Index of Organic Pollution – (A Case Study in Markeaton, Kedleston Hall, and Allestree Park Lakes Derby) UK

Macro invertebrates have been used to monitor organic pollution in rivers and streams. Several biotic indices based on macro invertebrates have been developed over the years including the Biological Monitoring Working Party (BMWP). A new biotic index, the Gammarus:Asellus ratio has been recently proposed as an index of organic pollution. This study tested the validity of the Gammarus:Asellus ratio as an index of organic pollution, by examining the relationship between the Gammarus:Asellus ratio and physical chemical parameters, and other biotic indices such as BMWP and, Average Score Per Taxon (ASPT) from lakes and streams at Markeaton Park, Allestree Park and Kedleston Hall, Derbyshire. Macro invertebrates were sampled using the standard five minute kick sampling techniques physical and chemical environmental variables were obtained based on standard sampling techniques. Eighteen sites were sampled, six sites from Markeaton Park (three sites across the stream and three sites across the lake). Six sites each were also sampled from Allestree Park and Kedleston Hall lakes. The Gammarus:Asellus ratio showed an opposite significant positive correlations with parameters indicative of organic pollution such as the level of nitrates, phosphates, and calcium and also revealed a negatively significant correlations with other biotic indices (BMWP/ASPT). The BMWP score correlated positively significantly with some water quality parameters such as dissolved oxygen and flow rate, but revealed no correlations with other chemical environmental variables. The BMWP score was significantly higher in the stream than the lake in Markeaton Park, also The ASPT scores appear to be significantly higher in the upper Lakes than the middle and lower lakes. This study has further strengthened the use of BMWP/ASPT score as an index of organic pollution. But additional application is required to validate the use of Gammarus:Asellus as a rapid bio monitoring tool.

ERP Implementation in Iran: (A Successful Experience in DGC)

Nowadays, the amounts of companies which tend to have an Enterprise Resource Planning (ERP) application are increasing. Although ERP projects are expensive, time consuming, and complex, there are some successful experiences. These days, developing countries are striving to implement ERP projects successfully; however, there are many obstacles. Therefore, these projects would be failed or partially failed. This paper concerns the implementation of a successful ERP implementation, IFS, in Iran at Dana Geophysics Company (DGC). After a short review of ERP and ERP market in Iran, we propose a three phases deployment methodology (phase 1: Preparation and Business Process Management (BPM) phase 2: implementation and phase 3: testing, golive-1 (pilot) and golive-2 (final)). Then, we present five guidelines (Project Management, Change Management, Business Process Management (BPM), Training& Knowledge Management, and Technical Management), which were chose as work streams. In this case study we present lessons learned in Project management and Business process Management.

How Virtualization, Decentralization and Network Building Change the Manufacturing Landscape: An Industry 4.0 Perspective

The German manufacturing industry has to withstand an increasing global competition on product quality and production costs. As labor costs are high, several industries have suffered severely under the relocation of production facilities towards aspiring countries, which have managed to close the productivity and quality gap substantially. Established manufacturing companies have recognized that customers are not willing to pay large price premiums for incremental quality improvements. As a consequence, many companies from the German manufacturing industry adjust their production focusing on customized products and fast time to market. Leveraging the advantages of novel production strategies such as Agile Manufacturing and Mass Customization, manufacturing companies transform into integrated networks, in which companies unite their core competencies. Hereby, virtualization of the process- and supply-chain ensures smooth inter-company operations providing real-time access to relevant product and production information for all participating entities. Boundaries of companies deteriorate, as autonomous systems exchange data, gained by embedded systems throughout the entire value chain. By including Cyber-Physical-Systems, advanced communication between machines is tantamount to their dialogue with humans. The increasing utilization of information and communication technology allows digital engineering of products and production processes alike. Modular simulation and modeling techniques allow decentralized units to flexibly alter products and thereby enable rapid product innovation. The present article describes the developments of Industry 4.0 within the literature and reviews the associated research streams. Hereby, we analyze eight scientific journals with regards to the following research fields: Individualized production, end-to-end engineering in a virtual process chain and production networks. We employ cluster analysis to assign sub-topics into the respective research field. To assess the practical implications, we conducted face-to-face interviews with managers from the industry as well as from the consulting business using a structured interview guideline. The results reveal reasons for the adaption and refusal of Industry 4.0 practices from a managerial point of view. Our findings contribute to the upcoming research stream of Industry 4.0 and support decision-makers to assess their need for transformation towards Industry 4.0 practices. 

Semantic Support for Hypothesis-Based Research from Smart Environment Monitoring and Analysis Technologies

Improvements in the data fusion and data analysis phase of research are imperative due to the exponential growth of sensed data. Currently, there are developments in the Semantic Sensor Web community to explore efficient methods for reuse, correlation and integration of web-based data sets and live data streams. This paper describes the integration of remotely sensed data with web-available static data for use in observational hypothesis testing and the analysis phase of research. The Semantic Reef system combines semantic technologies (e.g., well-defined ontologies and logic systems) with scientific workflows to enable hypothesis-based research. A framework is presented for how the data fusion concepts from the Semantic Reef architecture map to the Smart Environment Monitoring and Analysis Technologies (SEMAT) intelligent sensor network initiative. The data collected via SEMAT and the inferred knowledge from the Semantic Reef system are ingested to the Tropical Data Hub for data discovery, reuse, curation and publication.

Shot Transition Detection with Minimal Decoding of MPEG Video Streams

Digital libraries become more and more necessary in order to support users with powerful and easy-to-use tools for searching, browsing and retrieving media information. The starting point for these tasks is the segmentation of video content into shots. To segment MPEG video streams into shots, a fully automatic procedure to detect both abrupt and gradual transitions (dissolve and fade-groups) with minimal decoding in real time is developed in this study. Each was explored through two phases: macro-block type's analysis in B-frames, and on-demand intensity information analysis. The experimental results show remarkable performance in detecting gradual transitions of some kinds of input data and comparable results of the rest of the examined video streams. Almost all abrupt transitions could be detected with very few false positive alarms.

SUPAR: System for User-Centric Profiling of Association Rules in Streaming Data

With a surge of stream processing applications novel techniques are required for generation and analysis of association rules in streams. The traditional rule mining solutions cannot handle streams because they generally require multiple passes over the data and do not guarantee the results in a predictable, small time. Though researchers have been proposing algorithms for generation of rules from streams, there has not been much focus on their analysis. We propose Association rule profiling, a user centric process for analyzing association rules and attaching suitable profiles to them depending on their changing frequency behavior over a previous snapshot of time in a data stream. Association rule profiles provide insights into the changing nature of associations and can be used to characterize the associations. We discuss importance of characteristics such as predictability of linkages present in the data and propose metric to quantify it. We also show how association rule profiles can aid in generation of user specific, more understandable and actionable rules. The framework is implemented as SUPAR: System for Usercentric Profiling of Association Rules in streaming data. The proposed system offers following capabilities: i) Continuous monitoring of frequency of streaming item-sets and detection of significant changes therein for association rule profiling. ii) Computation of metrics for quantifying predictability of associations present in the data. iii) User-centric control of the characterization process: user can control the framework through a) constraint specification and b) non-interesting rule elimination.

Cumulative Learning based on Dynamic Clustering of Hierarchical Production Rules(HPRs)

An important structuring mechanism for knowledge bases is building clusters based on the content of their knowledge objects. The objects are clustered based on the principle of maximizing the intraclass similarity and minimizing the interclass similarity. Clustering can also facilitate taxonomy formation, that is, the organization of observations into a hierarchy of classes that group similar events together. Hierarchical representation allows us to easily manage the complexity of knowledge, to view the knowledge at different levels of details, and to focus our attention on the interesting aspects only. One of such efficient and easy to understand systems is Hierarchical Production rule (HPRs) system. A HPR, a standard production rule augmented with generality and specificity information, is of the following form Decision If < condition> Generality Specificity . HPRs systems are capable of handling taxonomical structures inherent in the knowledge about the real world. In this paper, a set of related HPRs is called a cluster and is represented by a HPR-tree. This paper discusses an algorithm based on cumulative learning scenario for dynamic structuring of clusters. The proposed scheme incrementally incorporates new knowledge into the set of clusters from the previous episodes and also maintains summary of clusters as Synopsis to be used in the future episodes. Examples are given to demonstrate the behaviour of the proposed scheme. The suggested incremental structuring of clusters would be useful in mining data streams.

Reactive Absorption of Hydrogen Sulfide in Aqueous Ferric Sulfate Solution

Many commercial processes are available for the removal of H2S from gaseous streams. The desulfurization of gas streams using aqueous ferric sulfate solution as washing liquor is studied. Apart from sulfur, only H2O is generated in the process, and consequently, no waste treatment facilities are required. A distinct advantage of the process is that the reaction of H2S with is so rapid and complete that there remains no danger of discharging toxic waste gas. In this study, the reactive absorption of hydrogen sulfide into aqueous ferric sulfate solution has been studied and design calculations for equipments have been done and effective operation parameters on this process considered. Results show that high temperature and low pressure are suitable for absorption reaction. Variation of hydrogen sulfide concentration and Fe3+ concentration with time in absorption reaction shown that the reaction of ferric sulfate and hydrogen sulfide is first order with respect to the both reactant. At low Fe2(SO4)3 concentration the absorption rate of H2S increase with increasing the Fe2(SO4)3 concentration. At higher concentration a decrease in the absorption rate was found. At higher concentration of Fe2(SO4)3, the ionic strength and viscosity of solution increase remarkably resulting in a decrease of solubility, diffusivity and hence absorption rate.

Unsupervised Outlier Detection in Streaming Data Using Weighted Clustering

Outlier detection in streaming data is very challenging because streaming data cannot be scanned multiple times and also new concepts may keep evolving. Irrelevant attributes can be termed as noisy attributes and such attributes further magnify the challenge of working with data streams. In this paper, we propose an unsupervised outlier detection scheme for streaming data. This scheme is based on clustering as clustering is an unsupervised data mining task and it does not require labeled data, both density based and partitioning clustering are combined for outlier detection. In this scheme partitioning clustering is also used to assign weights to attributes depending upon their respective relevance and weights are adaptive. Weighted attributes are helpful to reduce or remove the effect of noisy attributes. Keeping in view the challenges of streaming data, the proposed scheme is incremental and adaptive to concept evolution. Experimental results on synthetic and real world data sets show that our proposed approach outperforms other existing approach (CORM) in terms of outlier detection rate, false alarm rate, and increasing percentages of outliers.

Optimal Convolutive Filters for Real-Time Detection and Arrival Time Estimation of Transient Signals

Linear convolutive filters are fast in calculation and in application, and thus, often used for real-time processing of continuous data streams. In the case of transient signals, a filter has not only to detect the presence of a specific waveform, but to estimate its arrival time as well. In this study, a measure is presented which indicates the performance of detectors in achieving both of these tasks simultaneously. Furthermore, a new sub-class of linear filters within the class of filters which minimize the quadratic response is proposed. The proposed filters are more flexible than the existing ones, like the adaptive matched filter or the minimum power distortionless response beamformer, and prove to be superior with respect to that measure in certain settings. Simulations of a real-time scenario confirm the advantage of these filters as well as the usefulness of the performance measure.

Content Based Sampling over Transactional Data Streams

This paper investigates the problem of sampling from transactional data streams. We introduce CFISDS as a content based sampling algorithm that works on a landmark window model of data streams and preserve more informed sample in sample space. This algorithm that work based on closed frequent itemset mining tasks, first initiate a concept lattice using initial data, then update lattice structure using an incremental mechanism.Incremental mechanism insert, update and delete nodes in/from concept lattice in batch manner. Presented algorithm extracts the final samples on demand of user. Experimental results show the accuracy of CFISDS on synthetic and real datasets, despite on CFISDS algorithm is not faster than exist sampling algorithms such as Z and DSS.

An Effective Method for Audio Translation between IAX and RSW Protocols

Nowadays, Multimedia Communication has been developed and improved rapidly in order to enable users to communicate between each other over the Internet. In general, the multimedia communication consists of audio and video communication. However, this paper focuses on audio streams. The audio translation between protocols is a very critical issue due to solving the communication problems between any two protocols, as well as it enables people around the world to talk with each other at anywhere and anytime even they use different protocols. In this paper, a proposed method for an audio translation module between two protocols has been presented. These two protocols are InterAsterisk eXchange Protocol (IAX) and Real Time Switching Control Protocol (RSW), which they are widely used to provide two ways audio transfer feature. The result of this work is to introduce possibility of interworking together.