Simulation Data Summarization Based on Spatial Histograms

In order to analyze large-scale scientific data, research on data exploration and visualization has gained popularity. In this paper, we focus on the exploration and visualization of scientific simulation data, and define a spatial V-Optimal histogram for data summarization. We propose histogram construction algorithms based on a general binary hierarchical partitioning as well as a more specific one, the l-grid partitioning. For effective data summarization and efficient data visualization in scientific data analysis, we propose an optimal algorithm as well as a heuristic algorithm for histogram construction. To verify the effectiveness and efficiency of the proposed methods, we conduct experiments on the massive evacuation simulation data.

Key Frame Based Video Summarization via Dependency Optimization

As a rapid growth of digital videos and data communications, video summarization that provides a shorter version of the video for fast video browsing and retrieval is necessary. Key frame extraction is one of the mechanisms to generate video summary. In general, the extracted key frames should both represent the entire video content and contain minimum redundancy. However, most of the existing approaches heuristically select key frames; hence, the selected key frames may not be the most different frames and/or not cover the entire content of a video. In this paper, we propose a method of video summarization which provides the reasonable objective functions for selecting key frames. In particular, we apply a statistical dependency measure called quadratic mutual informaion as our objective functions for maximizing the coverage of the entire video content as well as minimizing the redundancy among selected key frames. The proposed key frame extraction algorithm finds key frames as an optimization problem. Through experiments, we demonstrate the success of the proposed video summarization approach that produces video summary with better coverage of the entire video content while less redundancy among key frames comparing to the state-of-the-art approaches.

Linguistic Summarization of Structured Patent Data

Patent data have an increasingly important role in economic growth, innovation, technical advantages and business strategies and even in countries competitions. Analyzing of patent data is crucial since patents cover large part of all technological information of the world. In this paper, we have used the linguistic summarization technique to prove the validity of the hypotheses related to patent data stated in the literature.

Summarizing Data Sets for Data Mining by Using Statistical Methods in Coastal Engineering

Coastal regions are the one of the most commonly used places by the natural balance and the growing population. In coastal engineering, the most valuable data is wave behaviors. The amount of this data becomes very big because of observations that take place for periods of hours, days and months. In this study, some statistical methods such as the wave spectrum analysis methods and the standard statistical methods have been used. The goal of this study is the discovery profiles of the different coast areas by using these statistical methods, and thus, obtaining an instance based data set from the big data to analysis by using data mining algorithms. In the experimental studies, the six sample data sets about the wave behaviors obtained by 20 minutes of observations from Mersin Bay in Turkey and converted to an instance based form, while different clustering techniques in data mining algorithms were used to discover similar coastal places. Moreover, this study discusses that this summarization approach can be used in other branches collecting big data such as medicine.

Surveillance Video Summarization Based on Histogram Differencing and Sum Conditional Variance

For more efficient and fast video summarization, this paper presents a surveillance video summarization method. The presented method works to improve video summarization technique. This method depends on temporal differencing to extract most important data from large video stream. This method uses histogram differencing and Sum Conditional Variance which is robust against to illumination variations in order to extract motion objects. The experimental results showed that the presented method gives better output compared with temporal differencing based summarization techniques.

Video Summarization: Techniques and Applications

Nowadays, huge amount of multimedia repositories make the browsing, retrieval and delivery of video contents very slow and even difficult tasks. Video summarization has been proposed to improve faster browsing of large video collections and more efficient content indexing and access. In this paper, we focus on approaches to video summarization. The video summaries can be generated in many different forms. However, two fundamentals ways to generate summaries are static and dynamic. We present different techniques for each mode in the literature and describe some features used for generating video summaries. We conclude with perspective for further research.

Feature-Based Summarizing and Ranking from Customer Reviews

Due to the rapid increase of Internet, web opinion sources dynamically emerge which is useful for both potential customers and product manufacturers for prediction and decision purposes. These are the user generated contents written in natural languages and are unstructured-free-texts scheme. Therefore, opinion mining techniques become popular to automatically process customer reviews for extracting product features and user opinions expressed over them. Since customer reviews may contain both opinionated and factual sentences, a supervised machine learning technique applies for subjectivity classification to improve the mining performance. In this paper, we dedicate our work is the task of opinion summarization. Therefore, product feature and opinion extraction is critical to opinion summarization, because its effectiveness significantly affects the identification of semantic relationships. The polarity and numeric score of all the features are determined by Senti-WordNet Lexicon. The problem of opinion summarization refers how to relate the opinion words with respect to a certain feature. Probabilistic based model of supervised learning will improve the result that is more flexible and effective.

Automatic Enhanced Update Summary Generation System for News Documents

Fast changing knowledge systems on the Internet can be accessed more efficiently with the help of automatic document summarization and updating techniques. The aim of multi-document update summary generation is to construct a summary unfolding the mainstream of data from a collection of documents based on the hypothesis that the user has already read a set of previous documents. In order to provide a lot of semantic information from the documents, deeper linguistic or semantic analysis of the source documents were used instead of relying only on document word frequencies to select important concepts. In order to produce a responsive summary, meaning oriented structural analysis is needed. To address this issue, the proposed system presents a document summarization approach based on sentence annotation with aspects, prepositions and named entities. Semantic element extraction strategy is used to select important concepts from documents which are used to generate enhanced semantic summary.

Fuzzy Sequential Algorithm for Discrimination and Decision Maker in Sporting Events

Events discrimination and decision maker in sport field are the subject of many interesting studies in computer vision and artificial intelligence. A large volume of research has been conducted for automatic semantic event detection and summarization of sports videos. Indeed the results of these researches have a very significant contribution, as well to television broadcasts as to the football teams, since the result of sporting event can be reflected on the economic field. In this paper, we propose a novel fuzzy sequential technique which lead to discriminate events and specify the technico-tactics on going the game, nor the fuzzy system or the sequential one, may be able to respond to the asked question, in fact fuzzy process is not sufficient, it does not respect the chronological order according the time of various events, similarly the sequential process needs flexibility about the parameters used in this study, it may affect a membership degree of each parameter on the one hand and respect the sequencing of events for each frame on the other hand. Indeed this technique describes special events such as dribbling, headings, short sprints, rapid acceleration or deceleration, turning, jumping, kicking, ball occupation, and tackling according velocity vectors of the two players and the ball direction.

Automatic Text Summarization

This work proposes an approach to address automatic text summarization. This approach is a trainable summarizer, which takes into account several features, including sentence position, positive keyword, negative keyword, sentence centrality, sentence resemblance to the title, sentence inclusion of name entity, sentence inclusion of numerical data, sentence relative length, Bushy path of the sentence and aggregated similarity for each sentence to generate summaries. First we investigate the effect of each sentence feature on the summarization task. Then we use all features score function to train genetic algorithm (GA) and mathematical regression (MR) models to obtain a suitable combination of feature weights. The proposed approach performance is measured at several compression rates on a data corpus composed of 100 English religious articles. The results of the proposed approach are promising.

Graph-Based Text Similarity Measurement by Exploiting Wikipedia as Background Knowledge

Text similarity measurement is a fundamental issue in many textual applications such as document clustering, classification, summarization and question answering. However, prevailing approaches based on Vector Space Model (VSM) more or less suffer from the limitation of Bag of Words (BOW), which ignores the semantic relationship among words. Enriching document representation with background knowledge from Wikipedia is proven to be an effective way to solve this problem, but most existing methods still cannot avoid similar flaws of BOW in a new vector space. In this paper, we propose a novel text similarity measurement which goes beyond VSM and can find semantic affinity between documents. Specifically, it is a unified graph model that exploits Wikipedia as background knowledge and synthesizes both document representation and similarity computation. The experimental results on two different datasets show that our approach significantly improves VSM-based methods in both text clustering and classification.

MTSSM - A Framework for Multi-Track Segmentation of Symbolic Music

Music segmentation is a key issue in music information retrieval (MIR) as it provides an insight into the internal structure of a composition. Structural information about a composition can improve several tasks related to MIR such as searching and browsing large music collections, visualizing musical structure, lyric alignment, and music summarization. The authors of this paper present the MTSSM framework, a twolayer framework for the multi-track segmentation of symbolic music. The strength of this framework lies in the combination of existing methods for local track segmentation and the application of global structure information spanning via multiple tracks. The first layer of the MTSSM uses various string matching techniques to detect the best candidate segmentations for each track of a multi-track composition independently. The second layer combines all single track results and determines the best segmentation for each track in respect to the global structure of the composition.

Wireless Sensor Networks:Delay Guarentee and Energy Efficient MAC Protocols

Wireless sensor networks is an emerging technology that serves as environment monitors in many applications. Yet these miniatures suffer from constrained resources in terms of computation capabilities and energy resources. Limited energy resource in these nodes demands an efficient consumption of that resource either by developing the modules itself or by providing an efficient communication protocols. This paper presents a comprehensive summarization and a comparative study of the available MAC protocols proposed for Wireless Sensor Networks showing their capabilities and efficiency in terms of energy consumption and delay guarantee.

Using Heuristic Rules from Sentence Decomposition of Experts- Summaries to Detect Students- Summarizing Strategies

Summarizing skills have been introduced to English syllabus in secondary school in Malaysia to evaluate student-s comprehension for a given text where it requires students to employ several strategies to produce the summary. This paper reports on our effort to develop a computer-based summarization assessment system that detects the strategies used by the students in producing their summaries. Sentence decomposition of expert-written summaries is used to analyze how experts produce their summary sentences. From the analysis, we identified seven summarizing strategies and their rules which are then transformed into a set of heuristic rules on how to determine the summarizing strategies. We developed an algorithm based on the heuristic rules and performed some experiments to evaluate and support the technique proposed.

FSM-based Recognition of Dynamic Hand Gestures via Gesture Summarization Using Key Video Object Planes

The use of human hand as a natural interface for humancomputer interaction (HCI) serves as the motivation for research in hand gesture recognition. Vision-based hand gesture recognition involves visual analysis of hand shape, position and/or movement. In this paper, we use the concept of object-based video abstraction for segmenting the frames into video object planes (VOPs), as used in MPEG-4, with each VOP corresponding to one semantically meaningful hand position. Next, the key VOPs are selected on the basis of the amount of change in hand shape – for a given key frame in the sequence the next key frame is the one in which the hand changes its shape significantly. Thus, an entire video clip is transformed into a small number of representative frames that are sufficient to represent a gesture sequence. Subsequently, we model a particular gesture as a sequence of key frames each bearing information about its duration. These constitute a finite state machine. For recognition, the states of the incoming gesture sequence are matched with the states of all different FSMs contained in the database of gesture vocabulary. The core idea of our proposed representation is that redundant frames of the gesture video sequence bear only the temporal information of a gesture and hence discarded for computational efficiency. Experimental results obtained demonstrate the effectiveness of our proposed scheme for key frame extraction, subsequent gesture summarization and finally gesture recognition.

Genetic Mining: Using Genetic Algorithm for Topic based on Concept Distribution

Today, Genetic Algorithm has been used to solve wide range of optimization problems. Some researches conduct on applying Genetic Algorithm to text classification, summarization and information retrieval system in text mining process. This researches show a better performance due to the nature of Genetic Algorithm. In this paper a new algorithm for using Genetic Algorithm in concept weighting and topic identification, based on concept standard deviation will be explored.

Text Summarization for Oil and Gas Drilling Topic

Information sharing and gathering are important in the rapid advancement era of technology. The existence of WWW has caused rapid growth of information explosion. Readers are overloaded with too many lengthy text documents in which they are more interested in shorter versions. Oil and gas industry could not escape from this predicament. In this paper, we develop an Automated Text Summarization System known as AutoTextSumm to extract the salient points of oil and gas drilling articles by incorporating statistical approach, keywords identification, synonym words and sentence-s position. In this study, we have conducted interviews with Petroleum Engineering experts and English Language experts to identify the list of most commonly used keywords in the oil and gas drilling domain. The system performance of AutoTextSumm is evaluated using the formulae of precision, recall and F-score. Based on the experimental results, AutoTextSumm has produced satisfactory performance with F-score of 0.81.

Extraction of Significant Phrases from Text

Prospective readers can quickly determine whether a document is relevant to their information need if the significant phrases (or keyphrases) in this document are provided. Although keyphrases are useful, not many documents have keyphrases assigned to them, and manually assigning keyphrases to existing documents is costly. Therefore, there is a need for automatic keyphrase extraction. This paper introduces a new domain independent keyphrase extraction algorithm. The algorithm approaches the problem of keyphrase extraction as a classification task, and uses a combination of statistical and computational linguistics techniques, a new set of attributes, and a new machine learning method to distinguish keyphrases from non-keyphrases. The experiments indicate that this algorithm performs better than other keyphrase extraction tools and that it significantly outperforms Microsoft Word 2000-s AutoSummarize feature. The domain independence of this algorithm has also been confirmed in our experiments.

Text Summarization for Oil and Gas News Article

Information is increasing in volumes; companies are overloaded with information that they may lose track in getting the intended information. It is a time consuming task to scan through each of the lengthy document. A shorter version of the document which contains only the gist information is more favourable for most information seekers. Therefore, in this paper, we implement a text summarization system to produce a summary that contains gist information of oil and gas news articles. The summarization is intended to provide important information for oil and gas companies to monitor their competitor-s behaviour in enhancing them in formulating business strategies. The system integrated statistical approach with three underlying concepts: keyword occurrences, title of the news article and location of the sentence. The generated summaries were compared with human generated summaries from an oil and gas company. Precision and recall ratio are used to evaluate the accuracy of the generated summary. Based on the experimental results, the system is able to produce an effective summary with the average recall value of 83% at the compression rate of 25%.

Principal Component Analysis using Singular Value Decomposition of Microarray Data

A series of microarray experiments produces observations of differential expression for thousands of genes across multiple conditions. Principal component analysis(PCA) has been widely used in multivariate data analysis to reduce the dimensionality of the data in order to simplify subsequent analysis and allow for summarization of the data in a parsimonious manner. PCA, which can be implemented via a singular value decomposition(SVD), is useful for analysis of microarray data. For application of PCA using SVD we use the DNA microarray data for the small round blue cell tumors(SRBCT) of childhood by Khan et al.(2001). To decide the number of components which account for sufficient amount of information we draw scree plot. Biplot, a graphic display associated with PCA, reveals important features that exhibit relationship between variables and also the relationship of variables with observations.