Metadata integration tool for open educational resources

Metadata - an organized information which defines, explains and facilitate easy retrieval and management information resources. Web technologies like a mechanism that contributes to the growth rate of implementation of electronic learning platforms.

Рубрика Коммуникации, связь, цифровые приборы и радиоэлектроника
Вид дипломная работа
Язык английский
Дата добавления 01.09.2018
Размер файла 1,5 M

Отправить свою хорошую работу в базу знаний просто. Используйте форму, расположенную ниже

Студенты, аспиранты, молодые ученые, использующие базу знаний в своей учебе и работе, будут вам очень благодарны.

Размещено на http://www.allbest.ru

Размещено на http://www.allbest.ru

Introduction

The Internet is filled with under-utilized open educational resources (OER) because of proliferation of educational platforms, with each implementing different metadata scheme. This study explores how the combined effect of metadata integration and metadata analysis techniques can provide more effective support and structured metadata for OER. The goal, therefore, is to improve the visibility of OER by designing a toolkit capable of integrating, analyzing and evaluating quality of OER metadata. To design the toolkit, a three-step metadata schemas matching template was adopted, comprising pre-integration, schemas comparison and merging stages. The pre-integration selects the schemas to be integrated; the schemas were analyzed to define the correspondent or similar elements; and lastly, the metadata schemas were merged to present a conjoint metadata view. Features analysis was conducted on the resultant metadata to better appreciate the inherent properties of OER. The OER quality evaluation outcomes demonstrate that the proposed metrics were quite sensitive as their values reflect identifiable quality flaws in the dataset.

1. Theoretical part

This chapter presents the introductory concepts of this research work. It broaches the idea open education by citing exciting examples. Next, the research justifications were established, leading to a brief highlight of aim and objectives. The research methodology adopted was stated and the contributions was put in perspective.

1.1 Background

The ideals of Open Education was premised on public good, meaning functional human capital development is dependent on qualitative education. The concept, therefore, was planned to introduce some convenience into the education landscape, lessen access barriers along the line of place, time, economy, and geography and age [1].

This open education campaign spurned other educational initiatives, for example, MIT Open Courseware (OCW) [27]. This initiative aggregates educational material and courses (undergraduate and graduate) to widen access to interested users. The popular MIT's OCW [24] stimulated other notable academic institutions to imitate this initiative but in a different manner and approach. This trend create a condition whereby more contents are made accessible to disparate categories of users.

In retrospect, the idea that stimulated the progressive growth of digital learning system which hit the public space in early 1990s. This growing wave of education technology projects initiated by many education-oriented institutions and communities to adopt the term, Open Educational Resources (OER), at a UNESCO-organized conference, in 2002 [29]. Resulting from this conference, participants devised the term `OER,' and defined it as the provision of education via the enablement of information technology, for such purposes as adaptation, teaching, research, consultation, and other non-commercial.

The openness of OER is steered by some global conventions, including the four (4) R's model proposed espoused by Ischinger (2007) [5].

· Reuse. It represents the core distinction of OER. It gives users the consent to partially or completely use OER content.

· Redistribute. This feature authorises users to circulate and share OERs with other users.

· Revise. With this features users could fine-tune and amend properties of learning materials.

· Remix. This gives users the pleasure to combine pre-existing OERs to produce new OERs.

The growing educational content is getting to the peak. This situation thus provide alternatives to opening up more educational opportunities at all stages of learning endeavour. Insights from literature [5, 8, 10, 11 and 15] disaggregated OER into:

· Learning Content: The medium that accommodate learning resources such textbooks, journal articles, magazines, etc.

· Tools: Software application used in designing, developing, and preserving OERs (e.g., content management systems).

· Implementation Licenses: This involves the rule regulating the use and design of OER contents and systems.

The massive online open courses (MOOC), as an idea, aimed at upscaling OER adoption to enhance the fortune OERs, and drive down associated costs. Of recent, MOOCs have enjoyed unquantified attention from social entrepreneurs, media outlets, educationists [29].

MOOC provides a veritable platform and a means to partake of high-quality courses offered by top-ranked educational institution to interested users. On one side, this seems a forceful tendency in global education as MOOCs actively use modern technologies like mobile and network services to support learning process. On another side, MOOC challenges traditional classroom learning, thereby removing the emotional attachment to physical locations.

Educational resources are core constituents of any educational system. The wide acceptance of `open education' facilitates the launch of new open learning platforms with capacity to store rich educational metadata. The discovery ability of OERs could be further enhanced, and made effective by linking them with standard metadata. Metadata plays a prime role in successfully searching and classifying educational collections using certain parameters. However, keywords search could be less effective and in some settings it doesn't work well for extracting attributes (e.g., author, title, subject area, etc.), given the high incidence of non-textual materials among OER. On OER systems, some metadata attributes are not sufficiently covered by generic metadata standards.

As regards quality, metadata evaluation seem a wonderful benchmark for measuring OER's discoverability and fitness for specific user cases. Also, some metadata imply the comparison of a certain standard with other educational metadata standards.

1.2 Motivation

metadata web information

Web technology facilitates the growing deployment pace of electronic learning platforms. Consequently, it is steadily becoming tough for users to come across good OER content online because of this proliferation. This issue also extends to content availability, access, costs, and information quality. And, owing to the swelling demand for information, growth in educational platforms and the accompanying opportunities, it is essential to address some salient issues concerning OER discoverability, availability and quality, by dwelling on OER metadata features, elements, content and platforms.

1.3 Problem Statement

Notable enthusiasts and promoters have monumentally invested in continuous penetration of open education. A wonderful illustration of this attempt is the Stanford University's MOOC platform, launched in 2001 [27, 31]. This innovation caused a massive attraction of users (learners, tutors, content creators) to free on-line courses. Since this happened, the acceptance of MOOC keep rising. And this include course offerings from numerous educational service providers in via different MOOC platforms.

Nonetheless, the ideas fueling `open education initiative,' the effort to fully harness the full benefits is still a bit of a challenge. This manifest in several forms and at different phases. The main challenge, however, revolves around some difficulties faced in harnessing the potential of OERs. This problem creates a situation where the Internet is burdened with enormous volume of under-utilized OERs.

Moreover, the task of locating OERs, evaluating their associated quality features, linking them up with other resources, and sharing with other users is being done by individual users. These issues, among others, form the focal points of this inquiry. The goal, therefore, of is to, after thorough analyses of the highlighted problems, design a metadata toolkit that would provide solution to these challenges, enhance visibility, reduce incidence of under-utilized learning resources.

1.4 Goal and Objectives

From the foregoing, the goal, therefore, is to upscale the visibility and usage of OER. This was accomplished by leveraging extant knowledge on electronic learning technology and metadata management.

The objectives are to develop:

• a reliable method to extract OER metadata from major e-learning collections;

• an efficient method to assess OER metadata quality features;

• a mini repository to warehouse and integrate the resultant OER metadata; and

• a prototype metadata portal to analyze and present results to OER users.

1.5 Research Methodology

In pursuing the intentions of this research, two managing metadata methods (template-based and linked data) were adopted. The first technique involves extraction of technical and descriptive data (e.g., title, author, course description, etc.). This permits the segregation of incoming metadata into appropriate groupings based onto extraction rules.

Web technology helps in linking interrelated documents. To show the anticipated interconnection of OERs, Linked Data was adopted, which is a computational way of publishing data, structured or semi-structured. This linkage facilitates semantic query. This idea rides on technological concepts like HTTP, URIs and RDF [7, 12, 19], to present and share metadata across platforms.

1.6 Contributions

This work is geared towards increasing OERs visibility by combining proven metadata management methods. Therefore, its contributions include:

· Aggregation of OER metadata in a single-target repository.

· Synchronization of OER metadata with other OERs on the Web.

· A portal (prototype) to make high-quality OER metadata available to users.

1.7 Thesis Organisation

Chapter 2: This chapter dissects metadata management, categories of metadata, lifecycle of metadata. Drawing extensively on the rich literature, the chapter dissect metadata standards relevant to OERs. The chapter also gives a broad review of metadata standards.

Chapter 3: This section analyse a wide-ranging survey of current methods for auto-generation of metadata. In addition, it profiles automated tools for generating metadata and present results of quality evaluation performed of the generated OER metadata.

Chapter 4: This section synthesizes the view of combining metadata integration and metadata analysis techniques to support structured metadata for OERs.

Chapter 5: This chapter summaries the major concerns raised and discussed in previous chapters and sets direction for future undertaken.

2. Metadata schemes for open educational resources

This chapter dissect some metadata standards related to educational recorded. It further discuss evolution of metadata, from one lifecycle stage to another and different groupings of metadata and their applicability to open education.

2.1 Metadata

By definition, metadata, according to National Information Standards Organization [13], is an organized information that defines, explains and facilitate easy retrieval and management information resources. Metadata provides enabling information to make some sense of concepts (e.g., classification schemes), data (e.g., datasets, images, etc.), and real-world entities (e.g. establishments, places, etc.).

A metadata schemes has three components - semantics, content and syntax, which are usually embedded in electronic resources or stored separately for reference purposes [3].

There are several of reasons for managing metadata, chief among them are:

· Availability: This speaks to organization of metadata for easy access and with minimal stress.

· Quality: the expectation of well-structured metadata provides quality information which are measurable using certain quality indicators or metrics.

· Persistence: metadata preservation guarantees relevance and integrity.

· Open License: a licensing regime is essential in making metadata available for public consumption.

2.2 Metadata Classifications

Within educational resources context, metadata is addressed mainly as descriptive information. This definition facilitates a robust depiction of educational materials for numerous purposes, which include identification. Several inquiry efforts on metadata have proposed diverse metadata types and classifications [6, 17, 18, 19, and 23].

Administrative metadata references data required to handle a hefty pool of OER resources. It comprises technical metadata -information about digital files necessary to translate raw streams data and present them in readable formats. Preservative metadata supports long-term use, planning and future migration of digital files.

Table 2.1 Metadata Categories

Type

Use

Examples

Administrative

It is used for administering digital collections

* Acquisition information

* Rights and replication tracking

* Legal requirements/documentation

Descriptive

It describes the attributes of information artefacts

* Cataloging records

* Finding assistances

* Specific indexes

Preservative

It is used preserving information resources

* Documentation of the physical state of resources

* Documentation of activities needed for preserving resources

Technical

It describes the behavioral tendencies of digital system

* System (hardware and software) configuration documentation

*Technical and digitization information

* Monitoring of response times

* Information security data

Use

It is used for describing properties such as volume, level, type, etc.

* Circulation records

* User tracking

* Content reuse

2.3 Metadata Lifecycle

The step-wise procedure that generally involves sequential development phases and changes is known as lifecycle. In metadata domain, it typically orbits around three stages, namely:

1. Collection: here vital information are identified and captured in a central repository.

2. Data architecture maintenance: this stage supports consistency of metadata vis-а-vis the operating environment.

3. Deployment: delivery of qualitative metadata to users via e-learning technologies.

2.4 OER Metadata Standards

Standards in metadata administration connotes a set rules governing application and metadata usage any domains. Metadata scheme is an aggregate of descriptive details attached to informational objects.

A sizeable quantity of standard schemes has arisen from decades of intensive research activities, to take care of specific educational information use and management requirements. These schemes continued to grow from specific needs of user groups to standardize how they categorize information. These metadata systems were created to address the need of individual domains, while others have generic and wide applications.

In literation, comparative analyses of metadata schemes have thrown up many selection issues. The IEEE Learning Object Metadata (LOM) is prominent standard that enjoins patronage in many educational domains [17]. Aside LOM, other popular education-oriented standards include IMS, Learning Resource Metadata Initiative (LRMI), and Dublin Core. LRMI, which is an offshoot of Schema.org, is also enjoying some recognition. Schema.org is a collective community with stressed on promoting schemas that support structured data for Web applications, for example, web pages, email messages etc.

2.5 The IEEE Learning Object Metadata (LOM)

This is a highly-recognized metadata scheme published and promoted by the IEEE Learning Technology Standard Committee [12, 22], to enable adequate description of OER. The scheme is a multi-faceted standard that defines LOM as a data model. This model clearly specify the aspect of OER necessary for description, and the kinds of vocabularies applicable. In addition, it outlines how to effect data model modification.

Figure 2.1. LOM Model

The LOM encloses a class of attributes required for managing and evaluating learning objects. These elements are in nine (9) categories:

1. General category: contains generic information that defines learning resources as a whole.

2. Lifecycle category: comprises the historical features and present status of OER.

3. Meta-Metadata category: encloses information about metadata instances instead of OER that the metadata instance describes.

4. Technical category: clusters technical features or characteristics of the resources.

5. Educational category: holds the educational and pedagogic features of OER.

6. Rights category: contains condition for using and distributing OER.

7. Relation category: contains relationship information among OERs.

8. Annotation category: contains user comments on OER.

9. Classification category: describes OER relative to a certain classification system.

In each category, metadata elements are hierarchically organized and values are assigned accordingly. For example, LOM model outlines how metadata elements is represented in XML [19] by providing one distinct XML schema and other options which can be personalized to particular data model's application profile.

2.6 IMS Metadata

This scheme was advanced by IMS Global Learning Consortium [22] to promote adoption of a technical specification that supports interoperable open learning technology. Recent IMS versions implement XML binding method and IEEE data model to backstop OERs [31]. These guidelines typically allow OER creators to provide interoperable instructional materials across learning management systems, authoring tools, and run time environment.

2.7 Dublin Core (DC)

The Dublin Core Metadata Initiative [9] is a forum that focuses on developing metadata standards that supports a wide-range data models applicable in different domains. Its schema comprise a considerable quantity of vocabularies at two different levels (both simple and qualified) [6].

The simple version contain fifteen elements plus a corollary set element qualifiers that improve semantic relationship of elements to aid resource discovery. These elements include: Title, Description, Contributor, Type, Source, Relation, Subject, Coverage, Creator, Publisher, Rights, Date, Identifier, Format and Language. Additionally, qualified DC comprise three elements: Rights, Audience, and Provenance.

Theses metadata elements have capability to offer extended cataloging data and support indexing operation. While DC metadata elements are useful for generic applications, it doesn't contain attributes that could describe the pedagogical viewpoint of a document.

Table 2.2. Dublin Core Elements

2.8 Learning Resource Metadata Initiative (LRMI)

LRMI was introduced by Association of Educational Publishers and Creative Commons [X], to institute a communal language for describing OER. This initiative produced a standard metadata structure for labeling learning resources. The initiative is historically the first industry-specific and independently developed framework, which helps in improving OER discovery. It was reported in some literature [21, 22, and 23] that, this framework for labeling and organizing OER increased the interoperability and transparency of educational repositories.

LRMI specification is an assembly of OER property descriptors built on the all-encompassing vocabulary provided by Schema.org [21]. A specification version of LRMI was integrated with Schema.org. Table 1 presents the version's metadata properties.

Table 2.2. LRMI Specification

Property

Expected Type

Description

educationalRole

Text

The role of intended audience.

educationalAlignment

AlignmentObject

A connection to a well-known educational framework.

educationalUse

Text

The purpose of OER in educational sense or context.

timeRequired

Duration

Approximate period for target audience to connect with OER.

typicalAgeRange

Text

Age range of envisioned OER end user.

interactivityType

Text

The prevalent learning mode supported by OER.

learningResourceType

Text

The predominant OER type.

useRightsURL

URL

It encapsulates the conditions for using OER.

isBasedOnURL

URL

A URL component for creating OER.

Schema.org consists of metadata representations for basic education-related information like author, datePublished, publisher, etc.

It makes provision for specific metadata types like ImageObject, AudioObject, VideoObject, Article, etc. LRMI recognized six (6) properties that are already covered by schema.org which are pertinent to OER. Table 2 summarises the attributes.

Table 2.3. Schema.org's Metadata

Property

Expected Type

Description

Name

schema.org/Text

The title of OER.

About

schema.org/Text

The domain of focus for OER.

dateCreated

schema.org/Date

The date of OER creation.

author

schema.org/Person

The individual/organization responsible for creating OER.

publisher

schema.org/Organization

The organization responsible for publishing OER.

inLanguage

schema.org/Language

The main language of OER

To normalize website markup, LRMI extends some elements on Schema.org. LRMI has somewhat become a reference standard for labeling educational content online. This is what a typical search engines will leverage to extract quality metadata from Web resources.

3. Metadata generation and quality assessment

This chapter surveys contemporary methods for generating of metadata in an automatic fashion. In addition, it profiles automated tools for generating metadata and presents some result of the comparative analysis performed of those tools based on standing knowledge in literature.

3.1 Metadata Generation

Generally, metadata, in any form, can be generated through either manual or automatic means, or both. The automatic mode of generating metadata relies on information extraction algorithms to generate and structure metadata. Greenberg [12] broached two trending methods of metadata auto-generation: harvesting and metadata elements extraction.

Metadata extraction, for instance, use methods such as automatic indexing, information retrieval to produce metadata of educational resources. Harvesting of metadata implies the practice used in pooling metadata from separate repositories, which could be manual or automatic. After successful harvest, resultant metadata is often transported to a selected repository for future use (e.g., information retrieval). Consequently, there are other context-specific techniques that researchers had advanced for generating metadata.

Polfreman [23] highlights some techniques with high usage statistics in other data processing domains. They include meta-tag harvesting, content extraction, automatic indexing, text and data mining, extrinsic data auto-generation, and tagging. The works of both Greenberg [10] and Polfreman [23] offer a comprehensive classification of the methods applied in building some trending metadata auto-generation tools. To explore this idea further, subsequent parts of this section are devoted to a broad survey and discussions on these techniques and corresponding tools.

Meta-Tag Extraction/

In meta-tag extraction values of metadata entity are identified, populated through an inspection of metadata tags correlated to a resource. This is another way of harvesting metadata and converting same into other formats; MarcEdit [24] is a widely-used tool in this regard.

It is generates metadata from sources driven by OAI-PMH-compliant technology capable of producing metadata from manifold sources [16]. This protocol enables adaptation of metadata into a range of formats like MAchine-Readable Cataloguing in XML (MARC XML) [18].

Table 3.1. Meta-tag extraction tools comparison

Tool Name

Technique

Functions

Apache POI - Text Extractor

content extractor; metatag harvester; extrinsic auto­generator

This tool delivers simple text extraction for all supported file formats. Also, the tool can access metadata linked with a given file, such as title and author.

Apache Tika

content extractor; metatag harvester; extrinsic autogenerator

It spots and extracts metadata from different targeted sources

Ariadne Harvester

meta-tag harvester

A harvester of OAI-PMH-compliant records is convertible to some other schemas, e.g., Learning Object Metadata (LOM).

Data Fountains

content extractor; automatic indexer; meta-tag harvester

It extracts information embedded in HTML meta-tags

Content Extraction/

This process combines data extraction techniques for pulling metadata from information resources. These methods, however, are hinged on the discovery or identification of appropriate meta-tags for the population of metadata values.

Kea Application [25], developed by the Digital Library of New Zealand, is a prominent example of this technique. This model syndicates machine learning with frequency-inverse document frequency, to identify and draw keywords form OER content [6, 7].

Table 3.2. Content extraction tools comparison

Name of Tool

Technique

Functions

CrossRef

content extractor;

This Web service program returns Digital Object Identifier for inputted references

Apache Tika

content extractor; meta-tag harvester; extrinsic auto-generator

It identifies and extracts metadata, text content from many sources.

CamEdit

content extractor

It allows metadata auto-creation to form clusters of interrelated resources.

Data Fountains

content extractor; automatic indexer; meta-tag harvester;

It looks through HTML documents, extracts data contained in meta-tags

Apache Standol

content extractor; automatic indexer

It extracts semantic metadata from text files and PDFs.

Automatic Indexing.

Automatic indexing uses rule-based method to extract values embedded potential information resources, rather than depending on meta-tag content applied to resources. However, this technique also involves the mapping of extracted metadata terms to controlled vocabularies [4, 14], for example, Library of Congress Subject Headings (LCSH), or to domain-specific or locally developed ontologies.

Table 3.3. Automatic indexing tools' comparison

Tool Name

Technique

Functions

Editor-Converter Dublin Core Metadata

metatag harvester; extrinsic auto-generator

It scans HTML documents to sieve out metadata from tags and adapts them to Dublin Core.

Dspace

meta-tag harvester; extrinsic auto-generator

It extracts automatically from technical information -file format, size, etc.

Apache Tika

content extractor; meta-tag harvester;

It identifies, extracts metadata from several targeted sources.

Data Fountains

content extractor; automatic indexer; meta-tag harvester

It looks through HTML documents, and then extracts information enclosed in meta-tags

Text/Data Mining.

Two of out of the methods (content extraction and automatic indexing) discussed above work with text/data mining methods for extracting metadata elements. These methods utilize machine-learning algorithms, statistical analysis of terms, frequencies, clustering techniques, or techniques that examine the frequency of term utilization between documents in lieu of the one that involves controlled vocabularies, and classifying techniques, or techniques that exploit the conventional structure of documents for the metadata auto-generation.

Extrinsic Data Auto Generation.

This method is another means to extract external information about digital resources [2, 17]. In the OER context, this technique is suitable for ascertaining the reputation of learning resources.

Table 3.4. Extrinsic data auto-generation tools comparison

Tool Name

Technique

Functions

Embedded Metadata Extraction Tool (EMET)

content extractor; meta-tag harvester; extrinsic auto-generator

This tool is best suited for pulling out metadata embedded in JPEG and TIFF files

Firefox Dublin Core Viewer Extension

meta-tag harvester; extrinsic auto-generator

It scans through HTML documents, draw out metadata from tags and displaying them in Dublin Core format.

Omeka

extrinsic auto-generator; social tagging

It extracts technical information, for example, file format, size, etc.

JHove

extrinsic auto-generator

It extracts metadata, validates structure of file formats

3.2 Metadata Extraction

Having synthesized ideas from the above survey, the metadata tool was designed to retrieve inputs form three major sources. First, Coursera RESTful API; this was used along with JSON to fetch metadata from Coursera course catalog in DC specification. Second, edX Course RSS Feed was used to retrieve educational data in LOM format. The last streams of data were gotten from OER Commons through Xpath extraction method. The data come in HTML format, so the elements or attributes were extracted in two phases (meta-tags extraction and content extraction) and converted to IMS metadata standard.

· Meta-tags extraction: this module was designed to extract basic metadata form URLs, meta-tag (title, description, and keyword) from e-learning platforms, OER data stores, etc. The meta-tag extraction was implemented to identify keywords necessary for searching e-learning platforms.

Figure 3.1. Meta-tags extraction from major e-learning platforms

· Content Extraction: this module obtained important OER information in a tiered (hierarchy) form, flowing down from sections to subsections, etc.

Figure 3.2. HTML DOM Locator

OER Metadata were captured by the DOM Locator and parsed to extract streams of data implanted in metadata fields. This was accomplished with use of XPath extraction rules and regular expressions. The rules were written specifically to an actual field of metadata source.

To follow through a path, a node was selected using XQuery. Below modes were used to locate metadata for easy extraction:

· Root nodes: in a node tree, the top part represent the root.

· Elements nodes: they are in form of <tag> data </tag>.

· Attributes nodes: are attributes residing in element's tags, i.e., <tag attribute=value>.

· Text nodes: the main content of element tags, i.e., <tag>data</tag>.

Figure 3.3. OER Metadata DOM Tree

· XPath Extraction Rules.

XPath script is a query-oriented language for finding metadata in Web-based documents. The rule-based methodology to information extraction makes XPath an effective way of extracting metadata. This algorithm captured metadata from were GUI-based sources. In such scenarios during runtime, educational web pages are scrapped and parsed, and the HTML DOM is formed. Figures 3.4 and 3.5 show how the extraction rule was used to identify, extract vital information from an HTML DOM.

Figure 3.4. Algorithm of the XPath extraction rule

Figure 3.5. Metadata Extraction Page

3.3 Metadata Quality

Metadata are information tips that facilitate ease access to OERs. In the OER space, if metadata contains less information chances are users will miss the learning resources, and content creators lose the resources (time and energy) invested in creation process and maintenance.

Quality is a crucial issue in handling metadata associated with learning objects [20, 30]. This is because players in the OER ecosystem come from different areas and experiences, and often have incomplete experience with library practices and information technology. This situation spurs the occurrence of metadata errors capable of blocking access to learning resources.

Quality Metrics of OER Metadata.

This study adopts the Bruce & Hillman quality framework [29, 30] because it is rich and well-known Information Quality parameters.

· Completeness: The range to which metadata instances contain all of the vital information needed to give a broad representation of OER. A good metadata instance should have the capacity to comprehensively describe OER.

· Provenance: The progression of OER metadata is another factor for determining its quality. Information items, for example, who created the instance, what changes has the metadata passed through, could provide more insight about OER quality.

· Conformance to Expectations: explains the degree of importance of terms charaterising OER, and shows the applicability of these terms to the search and integration OER metadata [28]. Since this estimation is not established on a survey to link a correspondence or convergence between the retrieved information and metadata elements, then all the judgments are not influenced by OER metadata description and its specification.

3.4 Implementation and evaluation of quality metrics

The proposed metrics were tested over three different metadata samples The first sample contain 2668 DC records of mostly automatically generated metadata from Coursera and edX [31] course catalog APIs, and lastly, a batch of 1000 LOM records automatically generated from the OER Commons. The above metrics were normalized to a 0-10 scale; these metrics were implemented and Table 3.8 summarizes the evaluation outcomes.

Figure 3.8. Results of OER metadata quality evaluation

Figure 3.9. Summary of OER metadata quality evaluation

3.5 Evaluation Results and Analysis

The essence of this evaluation was to ascertain quality of OER metadata by deploying a few but well-known and tested information quality metrics. The extracted metadata were summarized into six (6) segments: general, rights, technical, lifecycle classification, and educational. The metrics were evaluated in relation to a corresponding metadata classification.

An inspection on the collected samples showed: lack of completeness, redundant metadata, incorrect use of schema elements, especially Dublin Core (DC), and incorrect representation. All records examined were incomplete as none of them used all 15 DC and 9 LOM elements. Around eighty-four per cent (84%) of the records contained elements with duplicate metadata (i.e., Provider and Provider Set -OER Commons' metadata), and some elements were uncertain.

It was observed that some major metadata elements (i.e., Title, Description, Format, Type and Subject) were well populated, other elements (i.e., Relation, Rights, Language and Coverage) were quite not fully completed. The assessment also show that, the usage of some elements were frequent, and elements such as Range and Rights are not used often.

These quality metrics demonstrate that it is quite likely to operationalize and apply quality parameters to OER metadata evaluation. The outcome of this evaluation suggest that, the proposed metrics are sensitive to quality aspects OER metadata records as their values reflect identifiable quality flaws.

4. Metadata integration and analysis

This chapter inquires how metadata integration and metadata analysis techniques can offer a much effective support to structured metadata for open educational resources (OER). The result is presented and discussed in subsequent sections.

4.1 OER Metadata Integration

Metadata generation and consumption in many societies trending upwards. The digitalisation of most daily activities creates immense quantity of data (structured and unstructured) with large streams of metadata. Increases in volume and frequency of metadata in digital format enable its beneficial use which requires data integration.

Metadata integration and analytics are becoming key elements of information systems, especially educational technology [6, 7]. Data integration is an essential metadata management technique that increases the worth of both analytics and metadata [26] by merging data from numerous sources, and by providing users with a coherent representation (textual or graphic). It is relevant to some application spheres such as e-commerce, data warehousing, enterprise information integration, spatial data management, etc.

Most metadata integration frameworks or architectures support global schemas, which provides a combined view of primary sources. Metadata integration is realizable by following two data processing approaches. First, combination of dependent but of different metadata origins from the same subject; second, combination of independent metadata from the same metadata sources. These approaches justify the essence for schema matching -identification of correspondences among schematic elements of metadata to define mapping patterns.

4.2 Template-based Algorithm

A template constitutes a body of rules for pulling out information from any data sources. But in the OER context, it is a noble method to manage large heterogeneous educational collections by classify metadata into groups to form homogeneous sub-collections.

The template-based approach to processing metadata make sure that educational metadata contain labels set in advance and to classify the metadata into groups based on the label sets. A trusted metadata output contain a bunch of labels corresponding to most of its metadata fields.

Figure 4.1. Template-based algorithm for metadata classification

An unstructured metadata output don't attach labels to most of its metadata fields. The identification of some metadata fields in an unstructured metadata source relies on the organization of typographic features, and textual content.

4.3 Metadata Schema Matching Approach

Metadata schemas are structures describing the composition of metadata elements. This means metadata instances, can be stored, accessed, and interpreted by computer applications and users and. Aside the technical component of metadata (e.g., field formats and types), schematic description can, to some extent, address semantic issues concerning the contents and meanings of metadata, such as allowable values, cardinality, integrity and referential constraints.

So far, many schema languages had been introduced to diverse application domains. Examples include the Structured Query Language (SQL) for relational schemas, Document Type Definition (DTD) and XML Schema Definition (XSD) for XML document schemas, and Web Ontology Language (OWL) for ontologies [14, 19]. These schemas have varying properties and capabilities but they all, individually and collectively, contribute to the far-reaching use of schemas in data processing and management.

Figure 4.2. Schema Matching (Step 1)

Figure 4.3. Schema Matching (Step 2)

Schema matching facilitates semantic mapping of corresponding elements between among schema specimens [4, 14, and 19]. As illustrated above, metadata mapping is crucial in enabling schema integration and metadata transformation to obtain an integrated template of OER metadata.

Accordingly, Figures 4.1 and 4.2 show a two-step metadata schema mapping, to integrate a new source, S, to a pre-existing global schema, GS. Both S and GS were mapped into a distinct table to store educational metadata. As presented in Figure 4.2, the comparison of S and GS show some pairs of correspondent elements between two (2) schema specimens, such as Genaral-Title, Educational-Subject, Relation-Source, Relation-Identifier, and Format-Technical, etc.

The integration process requires inter-schema relationship fuse similar elements and dissimilar ones be merged under a coherent and integrated schema. The approaches focuses on resolving discrepancies between two (2) schema specimens resulting from the given inter-schema relationships [Figures 4.1-2]. Therefore, the resultant schema matching supports integration of OER metadata with the establishment of such inter-schema relationships.

4.4 Schema Information

In a match operation, schema information specify the elements to be matched, the process and corresponding output. Therefore, it stands reasonable to examine some typical schemas and their elements. Schemas could be provided in different formats and languages, such as SQL, UML, DTD, XSD, and OWL [4], depending on the field of application.

Input Information.

To resolve a given match problem, certain kinds of available information needed to define the semantics of OER metadata schema elements is explored to discover their similarity. This exploration focused on schema information, instance data, and auxiliary information:

· Schema information: The input schemas provide such information as description, type, schema structure, etc. This information was examined to compare the semantic properties of metadata schemas.

· Instance data: In generic data integration applications, schema matching uses metadata instances to characterize the elements of OER content. This approach was implemented to ensure a realistic level of metadata integration was achieved/

· Auxiliary information: This deals with all other breeds of information required for identifying similar elements between the schemas.

Output Information.

For example, given two input schemas (S1 and S2), the matching operation returns an output of a mapping that involved two schema (S1 and S2). This mapping clustered schema elements and correspondences, with each element showing that some elements of S1 are mapped to certain elements in S2. Each correspondence possesses a mapping expression, which specifies interrelationship of both schemas' elements.

· Semantics: The mapping expression uses simple relations such as set-oriented relationships (e.g., equivalence, overlapping), functions (e.g., arithmetic functions). Mathematically, functions help in forming semantic correspondences, as they specify how to transform instances of S1 elements in those of S2 elements. For the specification of mapping expressions, any expression language, XQuery was used along with SQL.

· Directionality: Mapping expressions can go in one or more directions. In particular, those indicating equality relations equivalence of sets are unidirectional. A mapping is multi-directional if all mapping expressions of its correspondences are not equal.

Here, the purpose for matching is to obtain a schema template of OER nmetadata that capture the critical elements of the other metadata schemas. The result mostly consists of corresponding schema elements specify how the element interrelates.

Figure 4.4. Metadata Schema Matching Architecture

This architecture enables schema matching access existing schema libraries and mappings of OER metadata, and through other auxiliary information to help find correspondences. The libraries is generated and sustained by the match the matching algorithm. This enables the generic matching algorithm to present a uniform metadata schema matching representation.

4.5 OER Metadata Features Analysis

The features analysis take a further look at the OER metadata retrieved from leading e-learning platforms to understand its main properties and structures. It further explores OER metadata features such as: media format, condition of use, educational use, material type, primary users, and educational level.

NullCount Analysis.

This analysis computes the size of valueless records, i.e., specified column contains no value. Table 1 show results of null records. Several elements, like “description”, “date” and “format” have a combined value of 35%. Users expect values in fields, especially fields as basic as `description'.

Table 4.1. OER Metadata NullCount Analysis

Pattern Analysis.

The Pattern analysis computes the different formats used for representing values. The values could be in alphabets or numerals or special characters. This analysis is imperative to inspect the metadata values that match certain fixed syntax.

Upon running the pattern algorithm, it was observed that 87% of the values match the required syntax. The different date fields offer a better opportunity to apply the pattern analysis. Table 4.2 present the frequent patterns used for representing date.

Table 4.2. OER metadata date patterns

Another challenge observed with dates is the incoherence of patterns which makes the searching difficult. This kind of analysis is good for developing normalization scripts and to populate value vocabularies.

Case Analysis.

This analysis present a usage summary of capitalized and non-capitalized alphabetic characters. The application of this analysis enables one to check metadata input consistency level.

Table 4.3. Analysis of upper- and lower-case characters

Length Analysis.

The length analysis computes the size of characters used in metadata fields. When applied to the field “description”, the analyzer projects 9057 values consist of 1341 characters. Same applies to “source” with 3179 values consist of 308 characters.

Table 4.4. OER Metadata Length Analysis

Other OER Metadata Features

· Material type: describes how OER contents are structured for delivery. These structures adjustable to the needs and modes of consumption of the marked audience.

· Media format: is the path or medium through which learning materials are made accessible to user community.

· Condition of use: this analysis show the conditions guiding consumption of learning materials. These preconditions are necessary to enforce compliance to intellectual property provisions.

· Primary user: shows the groups of active users of OER. The result reveal that users cut across every strata of educational system: student, administrator, teacher, etc.

· Educational use: this analysis provide some insight about why users consume OERs. These reasons range from professional development, assessment to curriculum development and instruction support.

· Educational level: this particular analysis categorized the user community along the line of age group and school grade -from high schools to graduate and career levels.

4.6 OER Metadata Warehouse Architecture

This metadata architecture is a snapshot of the mini-metadata warehouse created to store OER metadata. The system is centered on the well-known ETL (extract-transform-load) methodology [33], comprising four main components: data source, staging area, data marts, and a user interface.

Figure 4.5. OER Metadata Architecture

The methodology captures OER metadata from source data files, transforms raw streams of metadata into structured format compatible with predefined metadata warehouse format, and transports the resultant metadata to the suitable data warehouse partition(s).

Conclusion

This chapter concludes this study by summarising issues discussed in prior chapters and putting them in perspective, while setting direction for future undertakings.

As earlier stated, this study centers on designing an architecture for integrating and analysing open educational resources (OER) metadata. The standpoint of notable literature on issues relating metadata is that, management of educational metadata is essential for some reasons. First, availability, that is, access to metadata with minimal stress. Quality issues was another major point examined in thesis. This speaks to capacity of metadata to provide quality information. Metadata preservation is essential for long-term planning and to ensure data consistency. Also, public domain licensing is necessary in availing users' community with quality OER.

The OER communitydistinguished educational metadata as descriptive in nature, and thus expedient for identifying and describing contents of educational collections and associated resources, as different from others. Flowing from this, this metadata type references information essential to manage collections of educational resources. For preservation purposes, another corollary category of metadata was fashioned to management of informational resources. This include technical metadata specifies the behaviours and mode of use of OER metadata.

Lifecycle involves a process or pattern of developmental changes. Metadata lifecycle comprise three (3) stages. First, is collection -important data are identified and captured in the central repository? Second, maintenance of data architecture to reflect changes in the educational environment. The last stage deals with delivery of good quality metadata using the right deployment tools to deliver contents to users in the appropriate form.

As shown earlier, Greenberg [10] proposed the idea of automatic metadata generation, and concludes that, metadata can indeed be generated through extraction and harvesting, which depends on computational methods to simplify metadata creation process. These extraction methods, for example, automatic indexing, information retrieval, and web scraping were used along with metadata harvesting to generate OER metadata from foremost educational platforms.

Prior to this, we surveyed trending metadata management system for educational documents to gain more insights about these methods work. The tools considered are as follows: Apache Tika (meta-tag harvesting), Apache POI-Text Extractor (content extraction), Omeka (automatic indexing), Data Fountains (extrinsic auto data generation), and DSpace (social tagging) [17].

After generating OER metadata, we moved to the next segment: metadata quality. Quality issues are critical issue in managing metadata, especially one of learning resources. And as previously argued, metadata offer a better means of accessing e-learning resources. In the OER space, if metadata contain lesser information chances are users will miss out, and content creators will lose time and energy invested in the creation and maintenance.

To evaluate quality of previously generated OER metadata, we thought it necessary to apply proven metadata quality metrics. This notion rationalized the adopting of Bruce & Hillman information quality framework. This framework consists of several quality parameters, but we decided to settle for such quality indicators as completeness, conformance to expectation and provenance due to scope restrictions. Completeness measures how well metadata elements provide all the necessary descriptive data about educational resources. Provenance is the reputational gauge of OER metadata. The conformance factor measures how well OER metadata satisfy the requests of users' community.

The proposed metrics were applied to three different data samples -2668 DC records from Coursera and edX course catalog API, and lastly, a batch of 1000 LOM records automatically generated from the OER Commons. The afore-listed metrics were normalized to the scale of 0 to10 for easy assessment. These data set was summarized into six (6) segments: general, rights, technical, lifecycle classification, and educational.

Metrics estimation was done viz-a-viz the segmentations. The outcomes of the assessment show some deficiencies such as lack of completeness, redundant metadata, incorrect use of schema elements, especially Dublin Core (DC), and incorrect representation. All records examined were virtually incomplete as the majority used less than up 15 DC and 9 LOM elements. Ninety-four per cent (94%) records contained elements were duplicated (Provider and Provider Set -OER Commons); some date elements were uncertain.

Furthermore, it was observed that the main elements (i.e., Title, Type, Description, Format, and Subject) were well populated, other elements (i.e., Relation, Rights, Language and Coverage) were half populated. These quality metrics operationalized and apply quality parameters to OER metadata evaluation. The results of this evaluation suggest that, the proposed metrics were quite sensitive as their values reflect identifiable quality flaws in our dataset.

...

Подобные документы

  • Signal is a carrier of new information for the observer. Concept and classification detector signals, their variety and functional features. The detection abilities of different detector’s types, methodology and milestones of their determination.

    контрольная работа [1,1 M], добавлен 27.04.2014

  • Характеристика обладнання для побудови мереж IN компанії Lucent Technologies. Система 5ESS-2000, що складається з концентраторів SM-2000, модуля зв'язку СМ і адміністративного модуля AM. Архітектура та програмне забезпечення всіх компонентів IN.

    контрольная работа [350,6 K], добавлен 09.01.2011

  • Интеллектуальная система управления приточно-вытяжными установками IEVENT. Автоматизированная система управления вентиляцией и кондиционированием. Функциональная и принципиальные электрические схемы. Расчет затрат на оборудование и разработку системы.

    дипломная работа [5,7 M], добавлен 10.08.2014

  • Разработка и расчет синхронного суммирующего восьмиразрядного счетчика на основе JK-триггера. Моделирование схемы в программе Electronic Work Bench. Дешифрирование входных сигналов. Характеристики цифро-буквенного индикатора АЛС314А и дешифратора 514ИД4А.

    дипломная работа [339,4 K], добавлен 13.04.2014

  • Состав и анализ принципа работы схемы усилителя низких частот, ее основные элементы и внутренние взаимодействия. Расчет параметров транзисторов. Определение коэффициента усиления в программе Electronic Work Bench 5.12, входного и выходного сопротивлений.

    курсовая работа [748,3 K], добавлен 20.06.2012

  • Типи даних, які використовує Mpeg-4 Visual: статичні текстури, рухомі зображення. Застосування формату стиснення H.264/MPEG-4 Part 10. Аналіз програми MSU Video Quality Measurement Tool. Особливості формату Visual part 2, функції. Основні умови праці.

    дипломная работа [7,0 M], добавлен 05.04.2012

  • Технология SDH, основные функциональные модули сети. Процессы загрузки (выгрузки) цифрового потока. Мультиплексоры Metropolis AMS фирмы Lucent Technologies. Расчет передаточных параметров оптического кабеля. Пример расчёта компонентов транспортной сети.

    курсовая работа [1,5 M], добавлен 18.07.2014

  • Constructed and calculated at a three - phase rectifier working on active-inductive load. The review of constructive solutions. Calculation of rectifier working on active-inductive load. Principle of designed scheme operation, construction of the device.

    курсовая работа [413,6 K], добавлен 10.08.2015

  • Створення IN на базі станції АХЕ-10 фірми Ericsson. Інтелектуальні мережі компанії Huawei Technologies TELLIN. Російський варіант IN - АПКУ. Побудова IN на базі обладнання фірми Siemens. Етапи нарощування ресурсів мережі. Основні переваги IN TELLIN.

    реферат [1,0 M], добавлен 16.01.2011

  • Развитие и структура стека TCP/IP. Прикладной, транспортный, сетевой и канальный уровень. Гибкий формат заголовка. Поддержка резервирования пропускной способности. Протокол SNMP (Simple Network Management Protocol) для организации сетевого управления.

    реферат [404,3 K], добавлен 02.06.2016

  • Relevance of electronic document flow implementation. Description of selected companies. Pattern of ownership. Sectorial branch. Company size. Resources used. Current document flow. Major advantage of the information system implementation in the work.

    курсовая работа [128,1 K], добавлен 14.02.2016

  • Ability of the company to reveal and consider further action of competitive forces and their dynamics. Analysis of environment and the target market. Functional divisions and different levels in which еhe external information gets into the organization.

    статья [10,7 K], добавлен 23.09.2011

  • Types of the software for project management. The reasonability for usage of outsourcing in the implementation of information systems. The efficiency of outsourcing during the process of creating basic project plan of information system implementation.

    реферат [566,4 K], добавлен 14.02.2016

  • The material and technological basis of the information society are all sorts of systems based on computers and computer networks, information technology, telecommunication. The task of Ukraine in area of information and communication technologies.

    реферат [29,5 K], добавлен 10.05.2011

  • "E-democracy" is a public use of Internet technologies Analysis of the problems dialogue information and of the notional device, uniform and available for specialists, facilities of the electronic constitutional court, on-line participation of citizens.

    реферат [17,1 K], добавлен 14.02.2015

  • A database is a store where information is kept in an organized way. Data structures consist of pointers, strings, arrays, stacks, static and dynamic data structures. A list is a set of data items stored in some order. Methods of construction of a trees.

    топик [19,0 K], добавлен 29.06.2009

  • Consideration of a systematic approach to the identification of the organization's processes for improving management efficiency. Approaches to the identification of business processes. Architecture of an Integrated Information Systems methodology.

    реферат [195,5 K], добавлен 12.02.2016

  • The concept of economic growth and development. Growth factors: extensive, intensive, the growth of the educational and professional level of personnel, improve the management of production. The factors of production: labor, capital and technology.

    презентация [2,3 M], добавлен 21.07.2013

  • Theoretical foundation devoted to the usage of new information technologies in the teaching of the English language. Designed language teaching methodology in the context of modern computer learning aid. Forms of work with computer tutorials lessons.

    дипломная работа [130,3 K], добавлен 18.04.2015

  • Occurrence of new crimes in connection with development of new technologies and computerizations. The review and the characteristic of the most widespread internet crimes, ways of struggle against them. Methods of protection of the personal information.

    эссе [15,3 K], добавлен 15.03.2012

Работы в архивах красиво оформлены согласно требованиям ВУЗов и содержат рисунки, диаграммы, формулы и т.д.
PPT, PPTX и PDF-файлы представлены только в архивах.
Рекомендуем скачать работу.