Various forms of computer-mediated communication (CMC) have become ubiquitous, and influence our lives in many ways. Facebook, Myspace, Skype, Twitter, WhatsApp, and YouTube produce enormous amounts of traffic and datawhich is ideal for analysis. Automated tools for discourse analysis process this tremendous amount of computermediated discourse quickly. The aim of this thesis is to describe and develop a software architecture for an automated tool that analyzes computer-mediated discourses to answer the question “Who is communicating with whom?” at any point in time. Assigning receivers to each message is an important step. While direct addressing is helpful, it is not used in every message.
The author explores popular communication models and the most widely used CMC systems. The underlying communication model highlights the basic elements of CMC, and shows how this communication takes place. Based on this understanding, multiple views are defined by using different attributes and various guiding questions. Practical examples explain which basic information can be extracted from text-based discourses, and how that is done. The author mainly focuses on Internet Relay Chat (IRC) as an applied example because of its open and well-documented protocol. In discourses, it is not always clear who is communicating with whom; which especially affects the automatic analysis of discourses. It is important to identify the users nicknames in written discourse in order to determine who the respective senders and receivers are. However, the linguistic possibilities in nickname creation, and of using nicknames in the discourse, are various.
To study how nicknames are created and used in IRC, logs of 13 channels, consisting of 8937 public chat messages and 7936 unique nicknames, are analyzed in detail. This thesis shows the basic structure of IRC nicknames, which parts of speech group are used to compound nicknames, and which parts of speech of a nickname are omitted within the chat discourse. This knowledge leads to a better prediction as to whether there is a link between a current logged-in user and the examined word in discourse, which can be a shortened or creatively changed form of a nickname. Additionally, this work improves two other aspects: first, automated detection and mapping of written receiver names (or parts thereof) for logged-in users; and second, automated receiver guessing without semantics if no receiver name is specified. The architecture of the automated software is described in detail. An IRC discourse with 5605 messages is manually and automatically analyzed, and both approaches achieve similar results in detecting and guessing sender-receiver relations.