PD:Internal CCSDS 652.0-M-1 audit

From PUBLIC DOMAIN PROJECT
Jump to: navigation, search

Date of audit report: June 2016

Version of audit report: 1.0

This is the first audit of the public domain project. This audit is the starting point for a long term program to develop and install the required organizational and technical methods to fulfill the requirements for a long term digital archive.

This audit was done according to the recommended practice 652.0-M-1 AUDIT AND CERTIFICATION OF TRUSTWORTHY DIGITAL REPOSITORIES from 2011 published by the Consultative Committee for Space Data Systems (CCSDS). The same committee that published the Reference Model for an Open Archival Information System (OAIS).

It was clear from the beginning, that this first audit will show a lot of weak points, not addressed problems and essential requirements that are not fulfilled.

Therefor the huge amount of requirements marked as not fulfilled (red) should not lead to the verdict that the public domain project is completely not trustworthy. The existence of this audit is more than many other archives provide as publicly available information source to evaluate the trustworthiness.

The result of this first audit is the fundamental work for the development of requirements for future processes, technical methods and investments. It helps also to manage this development projects as it provides a metric to measure the impact of a proposal.

This audit will be replaced by a more recent audit after the implementation of serious improvements. So it is possible to track the efforts the project invests into the longterm preservation and its trustworthiness.

This audit documentation is structured in a similar way as the CCSDS 652.0-M-1 Recommended Practice document:

  • introduction
  • overview of audit and certification criteria
  • conclusion
  • catalog of requirements

Contents

OVERVIEW OF AUDIT AND CERTIFICATION CRITERIA

A TRUSTWORTHY DIGITAL REPOSITORY

Definition of a trustworthy digital repository as given in the CCSDS 652.0-M-1 Recommended Practice document:

A trustworthy digital repository will understand threats to and risks within its systems. Constant monitoring, planning, and maintenance, as well as conscious actions and strategy implementation will be required of repositories to carry out their mission of digital preservation. All of these present an expensive, complex undertaking that depositors, stakeholders, funders, the Designated Community, and other digital repositories will need to rely on in the greater collaborative digital preservation environment that is required to preserve the vast amounts of digital information generated now and into the future.


DEFINITIONS

Each requirement is marked with a color, to show its status of fulfillment:

  • Requirements fulfilled
  • Minor requirements are not fulfilled
  • Essential requirements not fulfilled

These definitions from the original audit document all apply to this internal audit:

For a better understanding some paragraphs of the CCSDS 652.0-M-1 Recommended Practice are reproduced here.

CONFORMANCE

Original text: An archive that conforms to this Recommended Practice shall have satisfied the auditor on each of the requirements.

EVIDENCE

Each metric in the Recommended Practice has associated with it informative text under the heading Examples of Ways the Repository Can Demonstrate It Is Meeting This Requirement providing examples of the evidence which might be examined to test whether the repository satisfies the metric. These examples are illustrative rather than prescriptive, and the lists of possible evidence are not exhaustive.

NOMENCLATURE

The following conventions apply for the normative specifications in this Recommended Practice:

a) the words ‘shall’ and ‘must’ imply a binding and verifiable specification;
b) the word ‘should’ implies an optional, but desirable, specification;
c) the word ‘may’ implies an optional specification;
d) the words ‘is’, ‘are’, and ‘will’ imply statements of fact.

ACRONYMS AND ABBREVIATIONS

AIP Archival Information Package (defined in reference [1])
CCSDS Consultative Committee for Space Data Systems
DEDSL Data Entity Specification Language
DIP Dissemination Information Package (defined in reference [1])
FITS Flexible Image Transport System
GIS Geographic Information System
ISO International Organization for Standardization
OAIS Open Archival Information System (see reference [1])
PDI Preservation Description Information (defined in reference [1])
SIP Submission Information Package (defined in reference [1])
TEI Text Encoding Initiative
UML Unified Modeling Language
XML Extensible Markup Language


REFERENCES

[1] Reference Model for an Open Archival Information System (OAIS).

For convenience the full text of the recommended practice CCSDS 652.0-M-1 AUDIT AND CERTIFICATION OF TRUSTWORTHY DIGITAL REPOSITORIES is readable on this wiki page: PD:CCSDS_652.0-M-1. Every requirement is directly linked to the corresponding explanation in the CCSDS 652.0-M-1 Recommended Practice.

The original document is published on the CCSDS Website: CCSDS Recommended Practices (Magenta Books)


CONCLUSION AND FIELDS OF NON CONFORMANCE

OVERVIEW

Of the 108 normative metrics the final status is the following:

Metrics with all requirements fulfilled (green): 16
Metrics where Minor requirements are not fulfilled (orange): 15
Metrics with essential requirements not fulfilled (red): 77


FIELDS OF NON CONFORMANCE

ESSENTIAL DEFINITIONS

In the project and between the project members there is a mutual understanding of the designated communities but it is not precisely defined and therefor the knowledge base of these communities is not known.

The same is true for the definition of the content information that has to be preserved.

REPRESENTATION INFORMATION

An example of an underdeveloped area is the field of representation information, because the awareness of the underlying problems and the concepts to handle them was missing in the project at the beginning of this audit. The consequent use of open standards and open source software makes it a bit less critical. But because any representation information is missing, it still creates a large long term risk for the repository.

This whole topic has to be addressed in the near future.

MANAGEMENT AND PRESERVATION PLANNING

The area of management tasks, strategic planning, development of policies and tracking is also underdeveloped. Also, there is no risk assessment installed and consequently there are no processes to observe the technical and legal environment of the repository on a regular basis.

Also missing is a system to plan, manage and track work packages, milestones, issues etc. to support the further development of management and preservation planning.

Furthermore a system which enables end users and producers to submit feedback and where submitters can observe the reactions and actions in response to their feedback is missing.

DIGITAL OBJECT MANAGEMENT

It was already known that the handling of the digital objects in the repository has its risks that have not been addressed yet.

The digital objects are at risk because there is no system to prevent unintended deletion of objects, there is no off-site backup and there is no system and associated monitoring to guarantee the bit-level correctness of the digital objects now and in the future.

Also the system to create identifiers for AIPs is not documented and is not ideal for the scalability of the repository.


CONCLUSION

As it was expected a lot of requirements are not fulfilled. However, the value of this audit is high because it detected very underdeveloped areas inside the project and as such raises the awareness of these problems. Twelve issues them can easily be fixed by documenting what is currently implemented.

It is of high importance to fix the lack of the essential definitions about content information and designated community.

A large field to work on are the regular maintenance and observation tasks of the management and preservation planning. They have to be defined, documented, executed and reviewed.

On the technical side, the completely missing representation information is a substantial shortcoming for a longterm repository. If this information is collected in the near future and maintained thereafter, the real risk of losing understandability is relatively low.

ORGANIZATIONAL INFRASTRUCTURE

With this chapter the catalog of requirements starts. Every requirement is explained in the CCSDS 652.0-M-1 document, this explanation can be reached directly by clicking on the heading of the requirement.


GOVERNANCE AND ORGANIZATIONAL VIABILITY

The repository shall have a mission statement that reflects a commitment to the preservation of, long term retention of, management of, and access to digital information.

Requirements fulfilled

Bylaws §2 of the Swiss Foundation Public Domain

The repository shall have a Preservation Strategic Plan that defines the approach the repository will take in the long-term support of its mission.

Essential requirements not fulfilled

The repository shall have an appropriate succession plan, contingency plans, and/or escrow arrangements in place in case the repository ceases to operate or the governing or funding institution substantially changes its scope.

Essential requirements not fulfilled

There is no succession plan to address the case the repository ceases to operate or the governing or funding institution substantially changes its scope.

Escrow arrangements are not needed because of the consequent use of free and open source software.

The repository shall monitor its organizational environment to determine when to execute its succession plan, contingency plans, and/or escrow arrangements.

Minor requirements are not fulfilled

Financial monitoring is done every year in retrospect to fulfill the accounting requirements for a charity foundation in Switzerland. This includes an external audit by certified layers and is supervised by the Eidgenössische Stiftungsaufsicht.

Monitoring the organizational environment is not in place and fiscal planning is underdeveloped.

The repository shall have a Collection Policy or other document that specifies the type of information it will preserve, retain, manage, and provide access to.

Requirements fulfilled

Bylaws §2 of the Swiss Foundation Public Domain


ORGANIZATIONAL STRUCTURE AND STAFFING

The repository shall have identified and established the duties that it needs to perform and shall have appointed staff with adequate skills and experience to fulfill these duties.

Essential requirements not fulfilled

The repository shall have identified and established the duties that it needs to perform.

Essential requirements not fulfilled

The repository shall have the appropriate number of staff to support all functions and services.

Essential requirements not fulfilled

The repository shall have in place an active professional development program that provides staff with skills and expertise development opportunities.

Essential requirements not fulfilled


PROCEDURAL ACCOUNTABILITY AND PRESERVATION POLICY FRAMEWORK

The repository shall have defined its Designated Community and associated knowledge base(s) and shall have these definitions appropriately accessible.

Essential requirements not fulfilled

The repository shall have Preservation Policies in place to ensure its Preservation Strategic Plan will be met.

Essential requirements not fulfilled

The repository shall have mechanisms for review, update, and ongoing development of its Preservation Policies as the repository grows and as technology and community practice evolve.

Essential requirements not fulfilled

The repository shall have a documented history of the changes to its operations,

Essential requirements not fulfilled

The repository shall commit to transparency and accountability in all actions supporting the operation and management of the repository that affect the preservation of digital content over time.

Minor requirements are not fulfilled

Reports of financial and technical audits:
Publications of the Foundation do not include yet financial reports because the foundation is quite new and the first report covering 2014 is just finished. But it is planned to publish them.

The first technical audit is the one that is presented in this document. It is published publicly on this page: Internal Audit (CCSDS_652.0-M-1)

Disclosure of governance documents:
As can be seen from other requirements there are no governance documents yet, so there is nothing to be publicly available.

Contracts and agreements with providers of funding and critical services:
They are not publicly available yet.

The repository shall define, collect, track, and appropriately provide its information integrity measurements.

Essential requirements not fulfilled

The repository shall commit to a regular schedule of self-assessment and external certification.

Essential requirements not fulfilled


FINANCIAL SUSTAINABILITY

The repository shall have short- and long-term business planning processes in place to sustain the repository over time.

Essential requirements not fulfilled

The repository shall have financial practices and procedures which are transparent, compliant with relevant accounting standards and practices, and audited by third parties in accordance with territorial legal requirements.

Requirements fulfilled

The auditor has witnessed audited annual financial statements for the year 2014 and 2015. As shown above, these statements are not publicly available which should be changed.

The repository shall have an ongoing commitment to analyze and report on financial risk, benefit, investment, and expenditure (including assets, licenses, and liabilities).

Essential requirements not fulfilled


CONTRACTS, LICENSES, AND LIABILITIES

The repository shall have and maintain appropriate contracts or deposit agreements for digital materials that it manages, preserves, and/or to which it provides access.

Essential requirements not fulfilled

The repository shall have contracts or deposit agreements which specify and transfer all necessary preservation rights, and those rights transferred shall be documented.

Essential requirements not fulfilled

The repository shall have specified all appropriate aspects of acquisition, maintenance, access, and withdrawal in written agreements with depositors and other relevant parties.

Essential requirements not fulfilled

The repository shall have written policies that indicate when it accepts preservation responsibility for contents of each set of submitted data objects.

Essential requirements not fulfilled

The repository shall have policies in place to address liability and challenges to ownership/rights.

Requirements fulfilled

The repository shall track and manage intellectual property rights and restrictions on use of repository content as required by deposit agreement, contract, or license.

Requirements fulfilled

The goal of the Public Domain Project is to make accessible digitized audio content. This requires a thoroughly checking of the intellectual property rights of each work.

The result of this effort can be seen on every detail information page like this example (Gramophone-14678-b45142). Every work includes information about the copyright status in Switzerland, the European Union and the United States including the year when the work enters the public domain. With this information it is possible to track once a year which works entered the public domain (Relevant is only the year, so once a year is enough) and can be made accessible.

DIGITAL OBJECT MANAGEMENT

INGEST: ACQUISITION OF CONTENT

The repository shall identify the Content Information and the Information Properties that the repository will preserve.

Essential requirements not fulfilled

The repository shall have a procedure(s) for identifying those Information Properties that it will preserve.

Essential requirements not fulfilled

The repository shall have a record of the Content Information and the Information Properties that it will preserve.

Essential requirements not fulfilled

The repository shall clearly specify the information that needs to be associated with specific Content Information at the time of its deposit.

Minor requirements are not fulfilled

The wiki template Audio_fileshows all needed information. But there is limited documentation about it and how to handle it. The section on references is appropriate but the problems start with date fields because there is no date format specified. Also problematic is, that there is no information about the vocabulary to use, should it be a controlled vocabulary (which one) is it free, are there different requirements for each field?

The repository shall have adequate specifications enabling recognition and parsing of the SIPs.

Essential requirements not fulfilled

The repository shall have mechanisms to appropriately verify the identity of the Producer of all materials.

Minor requirements are not fulfilled

To upload AIPs (Flac files) to the archival storage the FTP protocol is used with user authentication. The user name of the person who uploaded a certain file is visible on the file system of the archival storage as the UNIX owner of the file. This information is not visible and therefor not verifiable by the designated community.

The history of the associated PDI and the identity of its creator is publicly visible and can be verified by everyone. The PDI is stored and edited in wiki pages and MediaWiki requires a password protected user login to edit this information.

The repository shall have an ingest process which verifies each SIP for completeness and correctness.

Essential requirements not fulfilled

The repository shall obtain sufficient control over the Digital Objects to preserve them.

The repository shall provide the producer/depositor with appropriate responses at agreed points during the ingest processes.

Requirements fulfilled

With the depositors there are no agreed points where responses are necessary. But it is possible to get a lot of information about ingested objects by the category listing (each depositor has it's own category to list all his objects), the recent changes page of the wiki and other ways (eg. watchlists). Reports about the ingestion process are done usually annually for the financial supporters.

The repository shall have contemporaneous records of actions and administration processes that are relevant to content acquisition.

Essential requirements not fulfilled


INGEST: CREATION OF THE AIP

The repository shall have for each AIP or class of AIPs preserved by the repository an associated definition that is adequate for parsing the AIP and fit for long- term preservation needs.

Essential requirements not fulfilled

The repository shall be able to identify which definition applies to which AIP.

Essential requirements not fulfilled

The repository shall have a definition of each AIP that is adequate for long- term preservation, enabling the identification and parsing of all the required components within that AIP.

Essential requirements not fulfilled

The repository shall have a description of how AIPs are constructed from SIPs.

Essential requirements not fulfilled

The process is not documented but essentially the SIP is the Flac file that is uploaded to the storage server and forms together with the detailed description in the wiki the AIP. So the AIP consists of the wiki page and the linked Flac file.

The repository shall document the final disposition of all SIPs. In particular the following aspect must be checked.

The repository shall follow documented procedures if a SIP is not incorporated into an AIP or discarded and shall indicate why the SIP was not incorporated or discarded.

Minor requirements are not fulfilled

The repository shall have and use a convention that generates persistent, unique identifiers for all AIPs.

Essential requirements not fulfilled

Each AIP is identified by a string composed in the following way: <Label>-<Catalog number>-<Order number>

Example:

  • Label: Homocord
  • Catalog number: B 367
  • Order number: M 17234

This results in the URL for the detailed information page: http://pool.publicdomainproject.org/index.php/Homocord-b367-m17234

And the according Flac file name is: homocord-b367-m17234.flac

The repository shall uniquely identify each AIP within the repository.

Requirements fulfilled

The repository shall have unique identifiers.

Requirements fulfilled

Given that there are no conflicting catalog/order numbers used by a label. This is unlikely but it could happen.

The repository shall assign and maintain persistent identifiers of the AIP and its components so as to be unique within the context of the repository.

Requirements fulfilled

The described naming scheme is unique in the context of 78rpm records which are the only informations that are currently preserved.

Documentation shall describe any processes used for changes to such identifiers.

Essential requirements not fulfilled

There is no documentation.

The repository shall be able to provide a complete list of all such identifiers and do spot checks for duplications.

Minor requirements are not fulfilled

A complete list of all used identifiers is accessible via the category listing Audio file licenses.

There is no automated way to check for duplication. In the wiki it should not be possible to generate duplicates because the page names would create a conflict. There can be only one page with a certain name because no hierarchy is in use. But on the storage server it would be possible to accidentally create duplicates because of the manual upload process and the hierarchical organization (Folder structure by genre/artist).

The system of identifiers shall be adequate to fit the repository's current and foreseeable future requirements such as numbers of objects.

Essential requirements not fulfilled

It is obvious that this naming scheme is dependent on the naming of the collected items and is tailored to released records. The result are several problems:

  • It is unclear how to handle unreleased records (No order/catalog number)
  • The archive is open for other recording formats like cylinders, open reel tape and even motion pictures where this naming scheme is not usable
  • The naming scheme does not describe how to handle retouched versions (clean master) of the raw digitization (master) where both have to be searchable, distinguishable and accessible
The repository shall have a system of reliable linking/resolution services in order to find the uniquely identified object, regardless of its physical location.

Minor requirements are not fulfilled

The naming convention in use is suitable to meet this requirement if only shellac records are archived. Problematic is the missing documentation.

The repository shall have access to necessary tools and resources to provide authoritative Representation Information for all of the digital objects it contains. In particular the following aspects must be checked.

Essential requirements not fulfilled

The repository shall have tools or methods to identify the file type of all submitted Data Objects.

Requirements fulfilled

The Unix tool file and other more format specific tools are available on the servers.

The repository shall have tools or methods to determine what Representation Information is necessary to make each Data Object understandable to the Designated Community.

Essential requirements not fulfilled

No tools or methods in use.

The repository shall have access to the requisite Representation Information.

Requirements fulfilled

Due to the fact that the Public Domain Project only uses Free and Open Source Software (FOSS) access to all requisite Representation Information is guarantied.

The repository shall have tools or methods to ensure that the requisite Representation Information is persistently associated with the relevant Data Objects.

Essential requirements not fulfilled

This strong requirement is not fulfilled.

The repository shall have documented processes for acquiring Preservation Description Information (PDI) for its associated Content Information and acquire PDI in accordance with the documented processes. In particular the following aspects must be checked.

Essential requirements not fulfilled

The repository shall have documented processes for acquiring PDI.

Essential requirements not fulfilled

The repository shall execute its documented processes for acquiring PDI.

Essential requirements not fulfilled

The repository shall ensure that the PDI is persistently associated with the relevant Content Information.

Requirements fulfilled

At the moment the PDI is stored inside the MediaWiki and is permanently linked to the audio file (Which does not contain PDI). Both the file name of the content information and the wiki page use the same naming scheme so the association is obvious.

The repository shall ensure that the Content Information of the AIPs is understandable for their Designated Community at the time of creation of the AIP. In particular the following aspects must be checked.

Essential requirements not fulfilled

Repository shall have a documented process for testing understandability for their Designated Communities of the Content Information of the AIPs at their creation.

Essential requirements not fulfilled

The repository shall execute the testing process for each class of Content Information of the AIPs.

Essential requirements not fulfilled

The repository shall bring the Content Information of the AIP up to the required level of understandability if it fails the understandability testing.

Essential requirements not fulfilled

The repository shall verify each AIP for completeness and correctness at the point it is created.

Essential requirements not fulfilled

There is no verification process in use. Essentially the SIP is created by the same person that will ingest it and creates the AIP. For example there is no four-eyes principle in use.

The repository shall provide an independent mechanism for verifying the integrity of the repository collection/content.

Essential requirements not fulfilled

The repository shall have contemporaneous records of actions and administration processes that are relevant to AIP creation.

Minor requirements are not fulfilled

For the PDI there the records of actions are automatically captured. Every change on a wiki page is logged, the difference to the previous version can be inspected and the old version can be restored if needed. Here is an example how the version history looks like: Version history of Columbia-a3996-81215

But there is no such thing or other processes for the content information (the Flac files) to capture records of actions.


PRESERVATION PLANNING

The repository shall have documented preservation strategies relevant to its holdings.

Essential requirements not fulfilled

There are no documented preservation strategies.

The repository shall have mechanisms in place for monitoring its preservation environment.

Essential requirements not fulfilled

There are no formal mechanisms for monitoring the preservation environment. But the active people in the project are in regular contact with groups of the designated communities. This is done by attending conferences, assemblies, regular meetings of user groups. Also the recommendations on formats and media published by the associations of archives or libraries are observed in a informal way.

The repository shall have mechanisms in place for monitoring and notification when Representation Information is inadequate for the Designated Community to understand the data holdings.

Essential requirements not fulfilled

The repository shall have mechanisms to change its preservation plans as a result of its monitoring activities.

Essential requirements not fulfilled

This and the next requirement fail because there are no monitoring activities in place on which a reaction could be defined.

The repository shall have mechanisms for creating, identifying or gathering any extra Representation Information required.

Essential requirements not fulfilled

The repository shall provide evidence of the effectiveness of its preservation activities.

Essential requirements not fulfilled


AIP PRESERVATION

The repository shall have specifications for how the AIPs are stored down to the bit level.

Minor requirements are not fulfilled

All file formats used for AIPs or other relevant information are well documented open standards down to the bit level. The representation information is not available locally and it's not linked to the AIPs.

The repository shall preserve the Content Information of AIPs.

Essential requirements not fulfilled

No documented work flows.

The repository shall actively monitor the integrity of AIPs.

Essential requirements not fulfilled

The repository shall have contemporaneous records of actions and administration processes that are relevant to storage and preservation of the AIPs.

Essential requirements not fulfilled

The repository shall have procedures for all actions taken on AIPs.

Essential requirements not fulfilled

The repository shall be able to demonstrate that any actions taken on AIPs were compliant with the specification of those actions.

Essential requirements not fulfilled


INFORMATION MANAGEMENT

The repository shall specify minimum information requirements to enable the Designated Community to discover and identify material of interest.

Requirements fulfilled

All the descriptive information can be searched by the free text search function of the MediaWiki software. For example if someone is interested in instrumental music it can be found with the search term, according to the metadata attribute Vocal range with the value instrumental.

Search results for Vocal range instrumental

Another option to discover material of interest is by using the category system. Every work is added to several categories like genres, country of origin, creation year, recording formats, digitalization devices etc. The example recording above is in several categories, one is the recording label Decca Records. This information can be used to show all recordings of Decca Records in the public domain archive:

Category:Decca_Records

The repository shall capture or create minimum descriptive information and ensure that it is associated with the AIP.

Minor requirements are not fulfilled

The minimum information requirements are specified by the Audio file template in the wiki. This template acts as the input mask when the SIP is built. The template page also includes the documentation about the usage of this template.

Audio file template:
http://pool.publicdomainproject.org/index.php/Template:Audio_file

The template page also contains the available documentation how to use this template. There is no documentation about the vocabulary that should be used to fill the metadata information.

Responsibility for the procurement of metadata lies in the ingestion process where the SIP is assembled.

Capturing provenience and context information with help of this template is done manually and is done until all needed information is found. The minimum level is determined by the requirement that the public domain project is only allowed to publish works that are in the public domain (copyright free). So at least the information to decide on the legal status of the work must be present. This includes title, all authors and there living dates, first release date and publishing label. Technical metadata is also captured with this template like format of the analog recording, devices used for digitalization, catalog and stamper numbers and track length.

A finished SIP could look like this example: Decca-wa782-kwa5215

Additional to the descriptive information captured with the Audio file template the SIP gets also context information by categorization. The public domain project uses a polyhierarchical category tree that contains differentiation between genres, country of origin, creation year, recording formats, digitalization devices etc.

Missing is a documentation on the categorization process for an AIP.

As shown in The repository shall have and use a convention that generates persistent, unique identifiers for all AIPs. there is a naming scheme in use that provides persistent, unique identifiers for all AIPs and descriptive information as long as only shellac records are archived.

There is no detailed process work flow documentation.

There is no system and technical architecture documentation.

The repository shall maintain bi-directional linkage between each AIP and its descriptive information.

Minor requirements are not fulfilled

One direction is from the descriptive information to the AIP. This is achieved by a link on the wiki page withe the descriptive information to the Flac file. It is also achieved by the use of a unique, persistent identifier for each work. This identifier is used to name the wiki page containing the descriptive metadata and also the file name of the Flac file.

Example:

For the other direction the unique, persistent identifier is used to locate the descriptive metadata in the wiki. This can be done by directly entering the URL http://pool.publicdomainproject.org/index.php/ and the identifier at the end or by using the search function of the wiki.

Beneficial would be a URL (web link) to the descriptive information inside the Flac metadata tags.

For the physical records this bidirectional linking also holds as the context information (category) in the wiki contains the physical location of the record. In the opposite direction the catalog or stamper number can be used to find its descriptive information as well the number of the container to find the context information about the grouped items.

As with other requirements there is a lack of documentation about the process work flow and technical architecture.

The repository shall maintain the associations between its AIPs and their descriptive information over time.

Essential requirements not fulfilled


ACCESS MANAGEMENT

The repository shall comply with Access Policies.

Requirements fulfilled

The public domain project allows free unlimited access to its collection. So there is no need for user accounts or access management to use the available items.

This is documented for the designated communities on the landing page for the media pool and also on the multi language frequently asked questions (FAQ) page:

From Media pool main page:
Creative works of literature, science and art are subject to copyright law. Works in the public domain are those whose intellectual property rights have expired. With the help of volunteers, our team cleans, cataloged and digitized hundreds of gramophone records. After the clearing of copyrights, free works are available inside our media pool and Wikimedia Commons, compressed in Flac without any loss in quality (24-bit/192kHz).

And further down on the same page:
Permission: distributing, reproducing, streaming, sampling, remixing

From the FAQ:
Question: What I'm allowed to do with the music files?
Answer: There is no restriction. You can for ex. redistribute it, copy it, modify it, use it in your own productions, use it as background music and so on.

Unusual for other archives is the fact, that everybody can contribute to the project by supplying additional metadata, context information or by submitting SIPs to the archive. The project is and should be driven by volunteers as it is the base principle of this (and related) archives.

To be able to do so, a user needs to create a user account and a wiki administrator needs to give writing rights to this user. This is more complicated for a wiki than usual but it had to made that strict because of severe spamming problems. Without more wiki administrators the project is not able to maintain easy writing access and keeping the wiki free from spam.

The repository shall log and review all access management failures and anomalies.

Minor requirements are not fulfilled

Related to access management failures are two logging systems. The first are the logging features of the MediaWiki software. This logs can be accessed via the Special pages link in the wiki: Logs from the data pool wiki

The second logging system are the logs from the web server application (apache2) and a user front end (piwix) to create statistics and analyze this logs:

Not fulfilled is the requirement that written notes should exist of of reviews undertaken or action taken as a result of reviews.

From the discussion in the CCSDS_652.0-M-1 document one important concern is such as valid users’ being denied access.

Due to the nature of the public domain project this requirement is fulfilled when the requirement The repository shall maintain the associations between its AIPs and their descriptive information over time. is met because if it is possible to access the AIP from the descriptive metadata it is possible for the designated communities to get the AIP too. But this requirement is not yet fulfilled.

The repository shall follow policies and procedures that enable the dissemination of digital objects that are traceable to the originals, with evidence supporting their authenticity.

Requirements fulfilled

From the discussion section of this requirement: This requirement is concerned only with the relation between DIPs and the AIPs from which they are derived; elsewhere the link between the originals SIPs and the AIPs is considered.

The public domain project delivers as DIP directly the Flac file from the archival storage without modification. The designated community is able to check the authenticity of this Flac file because there are CRC and hashes included in the Flac file to detect transmission errors.

Additionally it would be helpful for the designated community to include the hash values of each AIP in the metadata details web page.

The repository shall record and act upon problem reports about errors in data or responses from users.

Minor requirements are not fulfilled

The repository acts quickly on problem reports from members of the designated community or from internal staff. Most of the time problems are reported by e-mail and then forwarded to the responsible person.

There are no formal processes for problem reports and the reports from the last years are not archived in a central place where they can be reviewed or tracked if they are solved.


INFRASTRUCTURE AND SECURITY RISK MANAGEMENT

TECHNICAL INFRASTRUCTURE RISK MANAGEMENT

Essential requirements not fulfilled

The repository shall identify and manage the risks to its preservation operations and goals associated with system infrastructure.

Essential requirements not fulfilled

The repository shall employ technology watches or other technology monitoring notification systems.

Essential requirements not fulfilled

The repository shall have hardware technologies appropriate to the services it provides to its designated communities.

Minor requirements are not fulfilled

Maintenance of up-to-date Designated Community technology, expectations, and use profiles; provision of bandwidth adequate to support ingest and use demands; systematic elicitation of feedback regarding hardware and service adequacy; maintenance of a current hardware inventory.

The server hardware for hosting the ingest, search and delivery services where upgraded in spring 2015. There performance is very good compared to the current workload. They are ready to handle many more users.

The archival storage system is still fast enough for the current demands and has also still enough storage capacity for the next time. In the archival storage system there are 50% spare slots for additional hard drives.

The Internet connectivity is a symmetrical 1 Gbit/s connection without traffic limitation. For the current user numbers this is enough to achieve short download times.

This requirement is not fulfilled because there exists no current hardware inventory and there is no procedure or system to ask for and receive feedback from the designated communities.

The repository shall have procedures in place to monitor and receive notifications when hardware technology changes are needed.

Essential requirements not fulfilled

No written procedures but monitoring systems in place to observe server workloads, memory usage, network traffic and free archival storage space.

The repository shall have procedures in place to evaluate when changes are needed to current hardware.

Essential requirements not fulfilled

The repository shall have procedures, commitment and funding to replace hardware when evaluation indicates the need to do so.

Essential requirements not fulfilled

The repository shall have software technologies appropriate to the services it provides to its designated communities.

Minor requirements are not fulfilled

The examples of ways the repository can demonstrate it is meeting this requirement clearly shows that several requirements have to be met:

Maintenance of up-to-date Designated Community technology, expectations, and use profiles
At the time of writing the expectations of the designated community is moving towards mobile use on smart phones and tablets. The MediaWiki software is not very well suited yet for this devices. This is a known problem and the MediaWiki community is working on this. On desktop and laptop computers the project gives access in a usefull way. The finding aids could be improved for the global community.

The on-line radio streams and the page to access this streams seems to fulfill the needs.

Provision of software systems adequate to support ingest and use demands
Software support for ingest is weak.

Systematic elicitation of feedback regarding software and service adequacy
There is no systematic elicitation of feedback about software topics.

Maintenance of a current software inventory
Software inventory is managed via the package manager apt used in Debian GNU/Linux. This covers most of the used software like operating system, common services, web server, on-line radio software etc. MediaWiki and the statistics tool piwix are maintained separately. To help the system administrator MediaWiki provides a inventory of installed extensions and version numbers of them together with the version number of the dependencies: Version number of MediaWiki and its extensions

The repository shall have procedures in place to monitor and receive notifications when software changes are needed.

Essential requirements not fulfilled

The repository shall have procedures in place to evaluate when changes are needed to current software.

Essential requirements not fulfilled

The repository shall have procedures, commitment, and funding to replace software when evaluation indicates the need to do so.

Essential requirements not fulfilled

The repository shall have adequate hardware and software support for backup functionality sufficient for preserving the repository content and tracking repository functions.

Essential requirements not fulfilled

The repository shall have effective mechanisms to detect bit corruption or loss.

Essential requirements not fulfilled

The repository shall record and report to its administration all incidents of data corruption or loss, and steps shall be taken to repair/replace corrupt or lost data.

Essential requirements not fulfilled

The repository shall have a process to record and react to the availability of new security updates based on a risk-benefit assessment.

Requirements fulfilled

To be informed which security updates are available the public domain project is subscribed to this two mailing lists:

Log files from the package manager are available on the server to check what software and patches was installed.

The risk-benefit analysis is done by the Debian security team. This volunteers monitor the newly discovered security problems in the Debian stable systems (GNU/Linux operating system and important application software). They prepare, test and provide patches against this problems and try to make sure that the expected behavior does not change.

For the MediaWiki software the risk-benefit analysis is done by the system administrator. But normally MediaWiki security patches are well tested and if there is any side effect it is documented by the developers: Archive of MediaWiki-announce mails

The repository shall have defined processes for storage media and/or hardware change (e.g., refreshing, migration).

Essential requirements not fulfilled

The repository shall have identified and documented critical processes that affect its ability to comply with its mandatory responsibilities.

Essential requirements not fulfilled

The repository shall have a documented change management process that identifies changes to critical processes that potentially affect the repository's ability to comply with its mandatory responsibilities.

Essential requirements not fulfilled

The repository shall have a process for testing and evaluating the effect of changes to the repository's critical processes.

Essential requirements not fulfilled

The repository shall manage the number and location of copies of all digital objects.

Essential requirements not fulfilled

The repository shall have mechanisms in place to ensure any/multiple copies of digital objects are synchronized.

Essential requirements not fulfilled


SECURITY RISK MANAGEMENT

The repository shall maintain a systematic analysis of security risk factors associated with data, systems, personnel, and physical plant.

Essential requirements not fulfilled

The repository shall have implemented controls to adequately address each of the defined security risks.

Essential requirements not fulfilled

The repository staff shall have delineated roles, responsibilities, and authorizations related to implementing changes within the system.

Essential requirements not fulfilled

The repository shall have suitable written disaster preparedness and recovery plan(s), including at least one off-site backup of all preserved information together with an offsite copy of the recovery plan(s).

Essential requirements not fulfilled