Go to page

Bibliographic Metadata

Data Quality Measurement in Wide-Column Stores / submitted by Julia Hilber, BSc
AuthorHilber, Julia
CensorWöß, Wolfram
PublishedLinz, 2018
Description95 Seiten : Illustrationen
Institutional NoteUniversität Linz, Masterarbeit, 2018
Document typeMaster Thesis
Keywords (DE)Datenqualität / NoSQL / Cassandra
Keywords (EN)data quality / NoSQL / Cassandra
Keywords (GND)Datenqualität / NoSQL-Datenbanksystem / Apache Cassandra / MySQL
URNurn:nbn:at:at-ubl:1-22748 Persistent Identifier (URN)
 The work is publicly available
Data Quality Measurement in Wide-Column Stores [2.84 mb]
Abstract (English)

Many companies and organizations make decisions with the help of data stored in database. Therefore, it is very important to know the data quality of a database, otherwise bad decisions are made if the data quality is poor. The most data issues come from human errors during data acquisi- tion, faulty process, wrongly-designed architectures, inconsistent definitions and incorrect usage of data. Nowadays the NoSQL stores are a hype resulting from unstructured data in the web, and hence it is even important to assess the quality of the NoSQL stores. Therefore, this thesis provides an approach for assessing the data quality for a Cassandra store only on the schema level. The schema part is more important than the instance part because a change in the schema is maybe not possible due to applications which do not allow changes. A correction and assessment of the instances is easier. The data quality assess- ment is done for the Cassandra store, because Cassandra is the second most used database from the NoSQL stores. In this work, the extension of the tool QuaIIe with the assessment of the Cassandra schema is described and implemented. This is achieved with an exact analysis of the existing program, especially the connector for MySQL databases. The further step is the transformation of the Cassandra schema into the DSD vocabulary and the direct comparison of the data quality as- sessment between a Cassandra store and a MySQL database. At the end of the implementation the assessment of the Cassandra schema is performed and evaluated.

The PDF-Document has been downloaded 24 times.