Composite Kernels for Public Emotion Recognition from Twitter

The Internet has grown into a powerful medium for information dispersion and social interaction that leads to a rapid growth of social media which allows users to easily post their emotions and perspectives regarding certain topics online. Our research aims at using natural language processing and text mining techniques to explore the public emotions expressed on Twitter by analyzing the sentiment behind tweets. In this paper, we propose a composite kernel method that integrates tree kernel with the linear kernel to simultaneously exploit both the tree representation and the distributed emotion keyword representation to analyze the syntactic and content information in tweets. The experiment results demonstrate that our method can effectively detect public emotion of tweets while outperforming the other compared methods.

The Use of Software and Internet Search Engines to Develop the Encoding and Decoding Skills of a Dyslexic Learner: A Case Study

This case study explores the impact of two major computer software programs Learn to Speak English and Learn English Spelling and Pronunciation, and some Internet search engines such as Google on mending the decoding and spelling deficiency of Simon X, a dyslexic student. The improvement in decoding and spelling may result in better reading comprehension and composition writing. Some computer programs and Internet materials can help regain the missing awareness and consequently restore his self-confidence and self-esteem. In addition, this study provides a systematic plan comprising a set of activities (four computer programs and Internet materials) which address the problem from the lowest to the highest levels of phoneme and phonological awareness. Four methods of data collection (accounts, observations, published tests, and interviews) create the triangulation to validly and reliably collect data before the plan, during the plan, and after the plan. The data collected are analyzed quantitatively and qualitatively. Sometimes the analysis is either quantitative or qualitative, and some other times a combination of both. Tables and figures are utilized to provide a clear and uncomplicated illustration of some data. The improvement in the decoding, spelling, reading comprehension, and composition writing skills that occurred is proved through the use of authentic materials performed by the student under study. Such materials are a comparison between two sample passages written by the learner before and after the plan, a genuine computer chat conversation, and the scores of the academic year that followed the execution of the plan. Based on these results, the researcher recommends further studies on other Lebanese dyslexic learners using the computer to mend their language problem in order to design and make a most reliable software program that can address this disability more efficiently and successfully.

Crossover Memories and Code-Switching in the Narratives of Arabic-Hebrew and Hebrew-English Bilingual Adults in Israel

This study examines two bilingual phenomena in the narratives of Arabic Hebrew and Hebrew-English bilingual adults in Israel: CO memories and code-switching (CS). The study examined these phenomena in the context of autobiographical memory, using a cue word technique. Student experimenters held two sessions in the homes of the participants. In separate language sessions, the participant was asked to look first at each of 16 cue words and then to state a concrete memory. After stating the memory, participants reported whether their memories were in the same language of the experiment session or different. Memories were classified as ‘Crossovers’ (CO) or ‘Same Language’ (SL) according to participants' self-reports. Participants were also required to elaborate about the setting, interlocutors and other languages involved in the specific memory. Beyond replicating the procedure of cuing technique, one memory from a specific lifespan period was chosen per participant, and the participant was required to provide further details about it. For the more detailed memories, CS count was conducted. Both bilingual groups confirmed the Reminiscence Bump phenomenon, retrieving more memories in the 10-30 age period. CO memories prevailed in second language sessions (L2). Same language memories were more abundant in first language sessions (L1). Higher CS frequency was found in L2 sessions. Finally, as predicted, 'individual' CS was prevalent in L2 sessions, but 'community-based' CS was not higher in L1 sessions. The two bilingual measures in this study, crossovers, and CS came from different research traditions, the former from an experimental paradigm in the psychology of autobiographical memory based on self-reported judgments, the latter a behavioral measure from linguistics. This merger of approaches offers new insight into the field of bilingual autobiographical memory. In addition, the study attempted to shed light on the investigation of motivations for CS, beginning with Walters’ SPPL Model and concluding with a distinction between ‘community-based’ and individual motivations.

Developmental Differences in the Construction of Concepts by Children from 3 to 14-Year-Olds: Perception, Language and Instruction

This study was designed to investigate the relationship between language and children’s construction of the concept of objects, actions, and states. Participants of this study are 120 children whose ages range from 3 to 14 years. Ten children participated from each age group and 10 adults participated as normative group. Data were collected using 28 words which were identified and grouped according to the purpose of this study. Participants were asked the question “What is x?’ for each word in a reserved room. The audio recorded data were transcribed and coded. The data were analyzed primarily qualitatively but quantitatively as well to support qualitative findings. The findings reveal that younger children rely more on their perceptual experience and linguistic input while 7-year-olds and older ones rely more on instructional language in the construction of the concepts related to objects, actions and states. Adults differ from all age groups with their usage of metaphors to refer to objects. It has been noted that linguistic, perceptual and instructional experiences work in an interwoven way but each one seems to be dominant at certain ages.

A Sociolinguistic Study of the Outcomes of Arabic-French Contact in the Algerian Dialect Tlemcen Speech Community as a Case Study

It is acknowledged that our style of speaking changes according to a wide range of variables such as gender, setting, the age of both the addresser and the addressee, the conversation topic, and the aim of the interaction. These differences in style are noticeable in monolingual and multilingual speech communities. Yet, they are more observable in speech communities where two or more codes coexist. The linguistic situation in Algeria reflects a state of bilingualism because of the coexistence of Arabic and French. Nevertheless, like all Arab countries, it is characterized by diglossia i.e. the concomitance of Modern Standard Arabic (MSA) and Algerian Arabic (AA), the former standing for the ‘high variety’ and the latter for the ‘low variety’. The two varieties are derived from the same source but are used to fulfil distinct functions that is, MSA is used in the domains of religion, literature, education and formal settings. AA, on the other hand, is used in informal settings, in everyday speech. French has strongly affected the Algerian language and culture because of the historical background of Algeria, thus, what can easily be noticed in Algeria is that everyday speech is characterized by code-switching from dialectal Arabic and French or by the use of borrowings. Tamazight is also very present in many regions of Algeria and is the mother tongue of many Algerians. Yet, it is not used in the west of Algeria, where the study has been conducted. The present work, which was directed in the speech community of Tlemcen-Algeria, aims at depicting some of the outcomes of the contact of Arabic with French such as code-switching, borrowing and interference. The question that has been asked is whether Algerians are aware of their use of borrowings or not. Three steps are followed in this research; the first one is to depict the sociolinguistic situation in Algeria and to describe the linguistic characteristics of the dialect of Tlemcen, which are specific to this city. The second one is concerned with data collection. Data have been collected from 57 informants who were given questionnaires and who have then been classified according to their age, gender and level of education. Information has also been collected through observation, and note taking. The third step is devoted to analysis. The results obtained reveal that most Algerians are aware of their use of borrowings. The present work clarifies how words are borrowed from French, and then adapted to Arabic. It also illustrates the way in which singular words inflect into plural. The results expose the main characteristics of borrowing as opposed to code-switching. The study also clarifies how interference occurs at the level of nouns, verbs and adjectives.

Teaching Translation in Brazilian Universities: A Study about the Possible Impacts of Translators’ Comments on the Cyberspace about Translator Education

The objective of this paper is to discuss relevant points about teaching translation in Brazilian universities and the possible impacts of blogs and social networks to translator education today. It is intended to analyze the curricula of Brazilian translation courses, contrasting them to information obtained from two social networking groups of great visibility in the area concerning essential characteristics to become a successful profession. Therefore, research has, as its main corpus, a few undergraduate translation programs’ syllabuses, as well as a few postings on social networks groups that specifically share professional opinions regarding the necessity for a translator to obtain a degree in translation to practice the profession. To a certain extent, such comments and their corresponding responses lead to the propagation of discourses which influence the ideas that aspiring translators and recent graduates end up having towards themselves and their undergraduate courses. The postings also show that many professionals do not have a clear position regarding the translator education; while refuting it, they also encourage “free” courses. It is thus observed that cyberspace constitutes, on the one hand, a place of mobilization of people in defense of similar ideas. However, on the other hand, it embodies a place of tension and conflict, in view of the fact that there are many participants and, as in any other situation of interlocution, disagreements may arise. From the postings, aspects related to professionalism were analyzed (including discussions about regulation), as well as questions about the classic dichotomies: theory/practice; art/technique; self-education/academic training. As partial result, the common interest regarding the valorization of the profession could be mentioned, although there is no consensus on the essential characteristics to be a good translator. It was also possible to observe that the set of socially constructed representations in the group reflects characteristics of the world situation of the translation courses (especially in some European countries and in the United States), which, in the first instance, does not accurately reflect the Brazilian idiosyncrasies of the area.

Gender and Advertisements: A Content Analysis of Pakistani Prime Time Advertisements

Advertisements carry a great potential to influence our lives because they are crafted to meet particular ends. Stereotypical representation in advertisements is capable of forming unconscious attitudes among people towards any gender and their abilities. This study focuses on gender representation in Pakistani prime time advertisements. For this purpose, 13 advertisements were selected from three different categories of foods and beverages, cosmetics, cell phones and cellular networks from the prime time slots of one of the leading Pakistani entertainment channel, ‘Urdu 1’. Both quantitative and qualitative analyses are carried out for range of variables like gender, age, roles, activities, setting, appearance and voice overs. The results revealed that gender representation in advertisements is stereotypical. Moreover, in few instances, the portrayal of women is not only culturally inappropriate but is demeaning to the image of women as well. Their bodily charm is used to promote products. Comparing different entertainment channels for their prime time advertisements and broadening the scope of this research will yield greater implications for the researchers who want to carry out the similar research. It is hoped that the current study would help in the promotion of media literacy among the viewers and media authorities in Pakistan.

The Study of Formal and Semantic Errors of Lexis by Persian EFL Learners

Producing a text in a language which is not one’s mother tongue can be a demanding task for language learners. Examining lexical errors committed by EFL learners is a challenging area of investigation which can shed light on the process of second language acquisition. Despite the considerable number of investigations into grammatical errors, few studies have tackled formal and semantic errors of lexis committed by EFL learners. The current study aimed at examining Persian learners’ formal and semantic errors of lexis in English. To this end, 60 students at three different proficiency levels were asked to write on 10 different topics in 10 separate sessions. Finally, 600 essays written by Persian EFL learners were collected, acting as the corpus of the study. An error taxonomy comprising formal and semantic errors was selected to analyze the corpus. The formal category covered misselection and misformation errors, while the semantic errors were classified into lexical, collocational and lexicogrammatical categories. Each category was further classified into subcategories depending on the identified errors. The results showed that there were 2583 errors in the corpus of 9600 words, among which, 2030 formal errors and 553 semantic errors were identified. The most frequent errors in the corpus included formal error commitment (78.6%), which were more prevalent at the advanced level (42.4%). The semantic errors (21.4%) were more frequent at the low intermediate level (40.5%). Among formal errors of lexis, the highest number of errors was devoted to misformation errors (98%), while misselection errors constituted 2% of the errors. Additionally, no significant differences were observed among the three semantic error subcategories, namely collocational, lexical choice and lexicogrammatical. The results of the study can shed light on the challenges faced by EFL learners in the second language acquisition process.

A Domain Specific Modeling Language Semantic Model for Artefact Orientation

Since the process of transforming user requirements to modeling constructs are not very well supported by domain-specific frameworks, it became necessary to integrate domain requirements with the specific architectures to achieve an integrated customizable solutions space via artifact orientation. Domain-specific modeling language specifications of model-driven engineering technologies focus more on requirements within a particular domain, which can be tailored to aid the domain expert in expressing domain concepts effectively. Modeling processes through domain-specific language formalisms are highly volatile due to dependencies on domain concepts or used process models. A capable solution is given by artifact orientation that stresses on the results rather than expressing a strict dependence on complicated platforms for model creation and development. Based on this premise, domain-specific methods for producing artifacts without having to take into account the complexity and variability of platforms for model definitions can be integrated to support customizable development. In this paper, we discuss methods for the integration capabilities and necessities within a common structure and semantics that contribute a metamodel for artifact-orientation, which leads to a reusable software layer with concrete syntax capable of determining design intents from domain expert. These concepts forming the language formalism are established from models explained within the oil and gas pipelines industry.

Machine Translation Analysis of Chinese Dish Names

This article presents a comparative study evaluating and comparing the quality of machine translation (MT) output of Chinese gastronomy nomenclature. Chinese gastronomic culture is experiencing an increased international acknowledgment nowadays. The nomenclature of Chinese gastronomy not only reflects a specific aspect of culture, but it is related to other areas of society such as philosophy, traditional medicine, etc. Chinese dish names are composed of several types of cultural references, such as ingredients, colors, flavors, culinary techniques, cooking utensils, toponyms, anthroponyms, metaphors, historical tales, among others. These cultural references act as one of the biggest difficulties in translation, in which the use of translation techniques is usually required. Regarding the lack of Chinese food-related translation studies, especially in Chinese-Spanish translation, and the current massive use of MT, the quality of the MT output of Chinese dish names is questioned. Fifty Chinese dish names with different types of cultural components were selected in order to complete this study. First, all of these dish names were translated by three different MT tools (Google Translate, Baidu Translate and Bing Translator). Second, a questionnaire was designed and completed by 12 Chinese online users (Chinese graduates of a Hispanic Philology major) in order to find out user preferences regarding the collected MT output. Finally, human translation techniques were observed and analyzed to identify what translation techniques would be observed more often in the preferred MT proposals. The result reveals that the MT output of the Chinese gastronomy nomenclature is not of high quality. It would be recommended not to trust the MT in occasions like restaurant menus, TV culinary shows, etc. However, the MT output could be used as an aid for tourists to have a general idea of a dish (the main ingredients, for example). Literal translation turned out to be the most observed technique, followed by borrowing, generalization and adaptation, while amplification, particularization and transposition were infrequently observed. Possibly because that the MT engines at present are limited to relate equivalent terms and offer literal translations without taking into account the whole context meaning of the dish name, which is essential to the application of those less observed techniques. This could give insight into the post-editing of the Chinese dish name translation. By observing and analyzing translation techniques in the proposals of the machine translators, the post-editors could better decide which techniques to apply in each case so as to correct mistakes and improve the quality of the translation.

Duration Patterns of English by Native British Speakers and Mandarin ESL Speakers

This study is intended to describe and analyze the effects of polysyllabic shortening and word or phrase boundary on the duration patterns of spoken utterances by Mandarin learners of English in comparison with native speakers of English. To investigate the relative contribution of these effects, two production experiments were conducted. The study included 11 native British English speakers and 20 Mandarin learners of English who were asked to produce four sets of tokens consisting of a mono-syllabic base form, disyllabic, and trisyllabic words derived from the base by the addition of suffixes, and a set of short sentences with a particular combination of phrase size, stress pattern, and boundary location. The duration of words and segments was measured, and results from the data analysis suggest that the amount of polysyllabic shortening and the effect of word or phrase position are likely to affect a Chinese accent for Mandarin ESL speakers. This study sheds light on research on the duration patterns of language by demonstrating the effect of duration-related factors on the foreign accent of Mandarin ESL speakers. It can also benefit both L2 learners and language teachers by increasing their sensitivity to the duration differences and difficulties experienced by L2 learners of English. An understanding of the amount of polysyllabic shortening and the effect of position in words and phrase on syllable duration can also facilitate L2 teachers to establish priorities for teaching pronunciation to ESL learners.

An Analysis of Language Borrowing among Algerian University Students Using Online Facebook Conversations

The rapid development of technology has led to an important context in which different languages and structures are used in the same conversations. This paper investigates the practice of language borrowing within social media platform, namely, Facebook among Algerian Vernacular Arabic (AVA) students. In other words, this study will explore how Algerian students have incorporated lexical English borrowing in their online conversations. This paper will examine the relationships between language, culture and identity among a multilingual group. The main objective is to determine the cultural and linguistic functions that borrowing fulfills in social media and to explain the possible factors underlying English borrowing. The nature of the study entails the use of an online research method that includes ten online Facebook conversations in the form of private messages collected from Bachelor and Masters Algerian students recruited from the English department at the University of Oum El-Bouaghi. The analysis of data revealed that social media platform provided the users with opportunities to shift from one language to another. This practice was noticed in students’ online conversations. English borrowing was the most relevant language performance in accordance with Arabic which is the mother tongue of the chosen sample. The analysis has assumed that participants are skilled in more than one language.

Analyzing Political Cartoons in Arabic-Language Media after Trump's Jerusalem Move: A Multimodal Discourse Perspective

Communication in the modern world is increasingly becoming multimodal due to globalization and the digital space we live in which have remarkably affected how people communicate. Accordingly, Multimodal Discourse Analysis (MDA) is an emerging paradigm in discourse studies with the underlying assumption that other semiotic resources such as images, colours, scientific symbolism, gestures, actions, music and sound, etc. combine with language in order to  communicate meaning. One of the effective multimodal media that combines both verbal and non-verbal elements to create meaning is political cartoons. Furthermore, since political and social issues are mirrored in political cartoons, these are regarded as potential objects of discourse analysis since they not only reflect the thoughts of the public but they also have the power to influence them. The aim of this paper is to analyze some selected cartoons on the recognition of Jerusalem as Israel's capital by the American President, Donald Trump, adopting a multimodal approach. More specifically, the present research examines how the various semiotic tools and resources utilized by the cartoonists function in projecting the intended meaning. Ten political cartoons, among a surge of editorial cartoons highlighted by the Anti-Defamation League (ADL) - an international Jewish non-governmental organization based in the United States - as publications in different Arabic-language newspapers in Egypt, Saudi Arabia, UAE, Oman, Iran and UK, were purposively selected for semiotic analysis. These editorial cartoons, all published during 6th–18th December 2017, invariably suggest one theme: Jewish and Israeli domination of the United States. The data were analyzed using the framework of Visual Social Semiotics. In accordance with this methodological framework, the selected visual compositions were analyzed in terms of three aspects of meaning: representational, interactive and compositional. In analyzing the selected cartoons, an interpretative approach is being adopted. This approach prioritizes depth to breadth and enables insightful analyses of the chosen cartoons. The findings of the study reveal that semiotic resources are key elements of political cartoons due to the inherent political communication they convey. It is proved that adequate interpretation of the three aspects of meaning is a prerequisite for understanding the intended meaning of political cartoons. It is recommended that further research should be conducted to provide more insightful analyses of political cartoons from a multimodal perspective.

The Efficiency of Association Measures in Automatic Extraction of Collocations: Exclusivity and Frequency

This paper deals with automatic extraction of 20 ‘adjective + noun’ collocations using four different association measures: T-score, MI, Log Dice, and Log Likelihood with most emphasis on mainly Log Likelihood and Log Dice scores for which an argument for their suitability in this experiment is to be presented. The nodes of the chosen collocates are 20 adjectival false friends between English and French. The noun candidate to be chosen needs to occur with a threshold of top ten collocates in two lists in which the results are sorted by Log Likelihood and Log Dice. The fulfillment of this criterion will guarantee that the chosen candidates are both exclusive and significant noun collocates and thereby, they make perfect noun candidates for the nodes. The results of the top 10 collocates sorted by Log Dice and Log Likelihood are not to be filtered. Thereby technical terms, function words, and stop words are not to be removed for the purposes of the analysis. Out of 20 adjectives, 15 ‘adjective + noun’ collocations have been extracted by the means of consensus of Log Likelihood and Log Dice scores on the top 10 noun collocates. The generated list of the automatic extracted ‘adjective + noun’ collocations will serve as the bulk of a translation test in which Algerian students of translation are asked to render these collocations into Arabic. The ultimate goal of this test is to test French influence as a Second Language on English as a Foreign Language in the Algerian context.

The Role of Ideophones: Phonological and Morphological Characteristics in Literature

Many Asian languages, such as Korean and Japanese, are well-known for their wide use of sound symbolic words or ideophones. This is a very particular characteristic which enriches its lexicon hugely. Ideophones are a class of sound symbolic words that utilize sound symbolism to express aspects, states, emotions, or conditions that can be experienced through the senses, such as shape, color, smell, action or movement. Ideophones have very particular characteristics in terms of sound symbolism and morphology, which distinguish them from other words. The phonological characteristics of ideophones are vowel ablaut or vowel gradation and consonant mutation. In the case of Korean, there are light vowels and dark vowels. Depending on the type of vowel that is used, the meaning will slightly change. Consonant mutation, also known as consonant ablaut, contributes to the level of intensity, emphasis, and volume of an expression. In addition to these phonological characteristics, there is one main morphological singularity, which is reduplication and it carries the meaning of continuity, repetition, intensity, emphasis, and plurality. All these characteristics play an important role in both linguistics and literature as they enhance the meaning of what is trying to be expressed with incredible semantic detail, expressiveness, and rhythm. The following study will analyze the ideophones used in a single paragraph of a Korean novel, which add incredible yet subtle detail to the meaning of the words, and advance the expressiveness and rhythm of the text. The results from analyzing one paragraph from a novel, after presenting the phonological and morphological characteristics of Korean ideophones, will evidence the important role that ideophones play in literature. 

Manipulation of Ideological Items in the Audiovisual Translation of Voiced-Over Documentaries in the Arab World

In a widely globalized world, the influence of audiovisual translation on the culture and identity of audiences is unmistakable. However, in the Arab World, there is a noticeable disproportion between this growing influence and the research carried out in the field. As a matter of fact, the voiced-over documentary is one of the most abundantly translated genres in the Arab World that carries lots of ideological elements which are in many cases rendered by manipulation. However, voiced-over documentaries have hardly received any focused attention from researchers in the Arab World. This paper attempts to scrutinize the process of translation of voiced-over documentaries in the Arab World, from French into Arabic in the present case study, by sub-categorizing the ideological items subject to manipulation, identifying the techniques utilized in their translation and exploring the potential extra-linguistic factors that prompt translation agents to opt for manipulative translation. The investigation is based on a corpus of 94 episodes taken from a series entitled 360° GEO Reports, produced by the French German network ARTE in French, and acquired, translated and aired by Al Jazeera Documentary Channel for Arab audiences. The results yielded 124 cases of manipulation in four sub-categories of ideological items, and the use of 10 different oblique procedures in the process of manipulative translation. The study also revealed that manipulation is in most of the instances dictated by the editorial line of the broadcasting channel, in addition to the religious, geopolitical and socio-cultural peculiarities of the target culture.

Collaborative Stylistic Group Project: A Drama Practical Analysis Application

In the course of teaching stylistics to undergraduate students of the Department of English Language and Literature, Faculty of Arts and Humanities, the linguistic tool kit of theories comes in handy and useful for the better understanding of the different literary genres: Poetry, drama, and short stories. In the present paper, a model of teaching of stylistics is compiled and suggested. It is a collaborative group project technique for use in the undergraduate diverse specialisms (Literature, Linguistics and Translation tracks) class. Students initially are introduced to the different linguistic tools and theories suitable for each literary genre. The second step is to apply these linguistic tools to texts. Students are required to watch videos performing the poems or play, for example, and search the net for interpretations of the texts by other authorities. They should be using a template (prepared by the researcher) that has guided questions leading students along in their analysis. Finally, a practical analysis would be written up using the practical analysis essay template (also prepared by the researcher). As per collaborative learning, all the steps include activities that are student-centered addressing differentiation and considering their three different specialisms. In the process of selecting the proper tools, the actual application and analysis discussion, students are given tasks that request their collaboration. They also work in small groups and the groups collaborate in seminars and group discussions. At the end of the course/module, students present their work also collaboratively and reflect and comment on their learning experience. The module/course uses a drama play that lends itself to the task: ‘The Bond’ by Amy Lowell and Robert Frost. The project results in an interpretation of its theme, characterization and plot. The linguistic tools are drawn from pragmatics, and discourse analysis among others.

Semantic Preference across Research Articles: A Corpus-Based Study of Adjectives in English

The goal of the present study is to investigate the semantic preference of the most frequent adjectives in research articles through a corpus-based analysis of texts published in journals in Applied Linguistics (AL). The corpus used in this study contains texts published in the period from 2014 to 2018 in the three journals: Language Learning and Technology; English for Academic Purposes, and TESOL Quaterly, totaling more than one million words. A corpus-based analysis was carried out on the corpus to identify the most frequent adjectives that co-occurred in the three journals. By observing the concordance lines of the adjectives and analyzing the words they associated with, the semantic preferences of each adjective were determined. Later, the AL corpus analysis was compared to the investigation of the same adjectives in a corpus of Chemistry. This second part of the study aimed to identify possible differences and similarities between the two corpora in relation to the use of the adjectives in research articles from both areas. The results show that there are some preferences which seem to be closely related not only to the academic genre of the texts but also to the specific domain of the discipline and, to a lesser extent, to the context of research in each journal. This research illustrates a possible contribution of Corpus Linguistics to explore the concept of semantic preference in more detail, considering the complex nature of the phenomenon.

Measuring Text-Based Semantics Relatedness Using WordNet

Measuring semantic similarity between texts is calculating semantic relatedness between texts using various techniques. Our web application (Measuring Relatedness of Concepts-MRC) allows user to input two text corpuses and get semantic similarity percentage between both using WordNet. Our application goes through five stages for the computation of semantic relatedness. Those stages are: Preprocessing (extracts keywords from content), Feature Extraction (classification of words into Parts-of-Speech), Synonyms Extraction (retrieves synonyms against each keyword), Measuring Similarity (using keywords and synonyms, similarity is measured) and Visualization (graphical representation of similarity measure). Hence the user can measure similarity on basis of features as well. The end result is a percentage score and the word(s) which form the basis of similarity between both texts with use of different tools on same platform. In future work we look forward for a Web as a live corpus application that provides a simpler and user friendly tool to compare documents and extract useful information.

Measuring Text-Based Semantics Relatedness Using WordNet

Measuring semantic similarity between texts is calculating semantic relatedness between texts using various techniques. Our web application (Measuring Relatedness of Concepts-MRC) allows user to input two text corpuses and get semantic similarity percentage between both using WordNet. Our application goes through five stages for the computation of semantic relatedness. Those stages are: Preprocessing (extracts keywords from content), Feature Extraction (classification of words into Parts-of-Speech), Synonyms Extraction (retrieves synonyms against each keyword), Measuring Similarity (using keywords and synonyms, similarity is measured) and Visualization (graphical representation of similarity measure). Hence the user can measure similarity on basis of features as well. The end result is a percentage score and the word(s) which form the basis of similarity between both texts with use of different tools on same platform. In future work we look forward for a Web as a live corpus application that provides a simpler and user friendly tool to compare documents and extract useful information.