Artemis MP3 Search Engine System And Software Requirements Definition

Document last modified: 2000-02-18

Authors:


Table of Contents

  1. Introduction
  2. Project Requirements
  3. System Requirements
  4. Glassary

1 Introduction

1.1 Purpose of SSRD

The purpose of the System and Software Requirements Definition is to inform the customer of all the responsibilities, constraints and limitations associated with the project. This document will help identify and outline the requirements for the proposed system's performance, usability, and future maintenance.

1.2 Reference

2 Project Requirements

The following section will specify the project requirements and relate them to the acceptability of our final product

2.1 Budget and Schedule Requirements

2.1.1 Budget Requirements

There are no budget requirements associated with Artemis.

2.1.2 Schedule Requirements

Project Requirement-01

Title: Schedule Constraint
Description: This project must be completed by the end of this semester.
Measurable: By determining how many of the requirements were fulfilled by the end of the semester.
Achievable: With strict adherence to the project deadlines and constant communication with our customer, we hope to complete the project within the constraint, leaving considerable time for testing and perfecting
Relevant: By not completing the project within the schedule time can result in a failing grade by the professor.
Specific: Our customer needs the product by the end of this semester or Guitar Notes will be forced to continue to use an inadequate search engine.

Project Requirement-02

Title: Enhance current search engine's spidering range
Description:The developers must overhaul the current search engine to broaden its searching capabilities in order to accommodate the needs of our customer. The searches should extend to a variety of internet sites to increase the likelihood of meeting the users' demands.
Measurable:Running test queries before and after implementing modifications in the search engine while closely monitoring the results.
Achievable:Through extensive research on the part of the developers as to the best choice for intermediate sites
Relevant: The achievement of this major goal holds the key to the passing grade mentioned in Project Goal-01, and covers the most pressing modification called for by the shortcomings of the current system.
Specific: By increasing searching capabilities of current search engine the site will have satisfied web users.

Project Requirements-03

Title: Produce reliable search results
Description: To ensure reliable search results from each query, meaning the link that is returned contains the correct URL, the audio file actually corresponds to the query, and the audio file still exists resides at the associated URL (i.e., has not become a broken link.)
Measurable: The number of successful audio files opened should increase if all files delivered actually contain the MP3s they allegedly link to.
Achievable: With the development of a scanning program that identifies all dead links and deletes them.
Relevant: The quality of matches is at least as important an enhancement to the system as the quantity of matches generated
Specific: Providing reliable links increases the popularity of Guitar Notes as well as their creditability.

2.2 Development Requirements

2.2.1 Programming Languages Requirements

Title: Specified Programming Language == PERL
Description: The customer has requested that all files be written in Perl 5
Measurable: If the language is exclusively Perl, it should run on the the Perl 5 interpreter
Achievable: Through intensive training in the Perl programming language on the part of the developers
Relevant: The current system is written in Perl, thus allowing easy integration of the new component that is to be our search engine
Specific: The use of any other programming languages, whether embedded or used confluently, would result in a great deal of confusion and the need for more complex documentation

2.2.2 Tools Requirements

There are no specific tools requirements for this project. The nature of Artemis allows for flexibility in the development environment and resources

2.2.3 Computer Hardware Requirements

Since this project is being implemented in Perl, which is an interpreted language, there are no specific hardware requirements.

2.2.4 Computer Communication Requirements

Title: Compliance with CGI Protocol
Description: The Search engine must be able to communicate with the web interface via the Common Gateway Interface (CGI)
Measurable: Artemis must be able to correctly process GET and POST requests as dictated by the CGI
Achievable: Through careful study of the CGI particulars and subsequent coding
Relevant: The search engine requires a means of communication with the user
Specific: The nature of the project, a web search engine, calls for the adoption of a universal means of accepting and returning data. CGI will allow the user to submit a query, and subsequently allow the search engine to communicate its findings

2.2.5 Computer Software Requirements

Title: FreeBSD Operating System
Description: The Artemis MP3 Search engine will run on the FreeBSD Operating System
Measurable: Via testing of Artemis on the target operating system.
Achievable: This is achievable by doing the actual coding and implementation directly on the FreeBSD operating system.
Relevant: The Guitar Notes server is currently running on the FreeBSD operating system
Specific: Keeping the Artemis development compatible with the current system will minimize the possible complications that could arise during the integration of the search engine

Title: Plain Text Data Formatting
Description: Artemis must be able to read and write data in plaintext format
Measurable: Testing of the search engine should yield legible results
Achievable: Our choice of programming language is ideal for the manipulation of plaintext
Relevant: The Guitar Notes' current database is a collection of text files. Our system will need to interact with these files
Specific: The search engine will be scanning this database. It will also involve the annexing of URLs to these files.

2.2.6 Standards Compliance Requirements

Title: HTTP 1.1
Description: The transmission of data to the web interface must satisfy the HTTP protocol
Measurable: If we neglect this, the return will generate a server error. If it is properly complied with, the search results will be displayed by the web browser
Achievable: With simple adherence to the HTTP 1.1 guidelines
Relevant: The search engine is web based.
Specific: It must communicate with the browser to successfully transmit its findings

2.2.7 Evolution Requirements

The customer does not currently have any specific evolutionary ideas in mind. Professor Nieh does require that in the event that he does formulate an idea, it should be easy to implement by modifying our current system. To that end, we will endeavor to keep the code modular, scalable, and extraordinarily well documented.

2.3 Packaging Requirements

Professor Nieh will provide the team with a test directory on the Guitar Notes server. After extensive testing has demonstrated the code's effectiveness and reliability, Artemis will be uploaded to the guitarnotes.com web site by the team where it will face the ultimate test: the random user. Prof. Nieh will oversee integration of Artemis into the guitarnotes.com website. A compilation of all technical documentation produced in parallel with the code will also be delivered at the point of installation.

3 System Requirements

3.1 System Definition

The actual search engine comprises the primary portion of the Artemis MP3 Search engine project. It will search through the local database, then through the specified additional websites, all the while attempting to match the user's query as closely as possible. The search engine's principal accomplice is a continuous spider that traverses the WWW in the background searching for audio files to add to the database. The database will feature a verifier that iterates through the URLs stored to verify that they are still valid and functioning, and removes all that are not.

System Block Diagram

System Block Diagram

3.2 System Capability Requirements

Every one of the following capabilities has been deemed mandatory by the customer. Each one is a priority and a necessary element in the realization of the project's goals. None are dispensable.

Proposed Capability-01

Title: Internet searching application
Description: We, the developers, will create a new database of web sites that are likely to contain up-to-date collections of audio files. When the user enters a query, the search engine will no longer merely spider the current Guitar Notes database. It will concurrently search, using some artificially intelligent application, the internet sites listed in the new database.
Measurable: By analysis of the quantitative improvement in search findings.
Achievable: Through independent internet research on the part of the developers to pinpoint the appropriate intermediate sites
Relevant: Users are more likely to find exactly what they are looking for when the information from multiple internet resources are pooled.
Specific: Users can expect both a greater number and an enhanced diversity of search results.

Proposed Capability-02

Title: Periodic filtering of database
Description: The database needs some system to scan the URLs at regular intervals, removing any dead links detected.
Measurable: Reduction in the number of dead files delivered to the user could be measured either by merely soliciting feedback from the user or by testing/sampling actual results.
Achievable: With some simple code
Relevant: This function would represent a vital step in the goal to transmit reliable search results.
Specific: Users tend to find dead links or misinformation very frustrating and discouraging, and could develop negative feelings to the Guitar Notes project in general if subjected to it too frequently.

Proposed Capability-03

Title: Delivery of results in specified format
Description: The customer requires that the MP3 file returned be accompanied by the artist's name, song title, size of the file, nature of audio file
Measurable: The testing of the search engine should yield results in this format
Achievable: Using a system that extracts the aforementioned data from the web site associated with the audio file
Relevant: The data will assist the user in identifying the relevance of the returned file to his/her query.
Specific: In the event that the search has missed its mark in certain returns, the user will avoid the frustration of having to open irrelevant audio files.

Proposed Capability-04

Title: Continuous background searching application
Description: The search engine should feature a program that automatically spiders the internet for audio files for random recordings, 24 hours a day, seven days a week, entirely unprompted. It's findings would be automatically sorted and annexed to the Guitar Notes database.
Measurable: The very size of the database should increase dramatically.
Achievable: Using code that retrieves web pages and scans its links for audio files
Relevant: The task of expanding the database manually would consume countless hours of site developers' time (and thereby interfere with the aforementioned semester schedule restraint). Relying on submission of audio file URLs by web users is not a sufficient route to expansion-primarily because they are the intended beneficiaries, not the providers, but also because their involvement is unpredictable and erratic.
Specific: The very blindness of the method is likely to unearth material/sites that a developer would never conceive of probing.

Proposed Capability-05

Title: Ongoing transmission of results
Description: The search engine should deliver matches as they are found rather than forming a collection and delivering the final package. This will infuse a sense of instant gratification in the user.
Measurable: Reduction in the impatience of the user could be observed by monitoring the frequency of suspended searches, or by merely soliciting feedback from the user.
Achievable: Using code that eliminates the intermediate buffer so that the search results are flushed to the browser continuously
Relevant: The new and improved search engine will most likely take more time than the original, since it will search a greater area. This circumstance calls for greater caution in keeping the customer waiting for an answer.
Specific: Users of search engines often do not require more than a small number of matches that meet their criteria, and would consider their search successful as soon as these were displayed. This method also constructs the illusory impression that the search engine is operating at a greater speed.

3.3 User Interface Requirements

3.3.1 Graphical User Interface Requirements

This project is not responsible for developing a graphical user interface. The only interface of relevance is the guitar notes website, which already exists.

3.3.2 Command-Line Interface Requirements

The search engine part of the project has no command line interface requirements. The spider will have the following command line interface requirements:
    % artemis-spider[OPTION]... [URL]... [FILE]...

        -s, --start [URL|FILE] (starts the spider with the specified
                                URL or file containing URLs)

        -v, --verbose (verbose output)

        -m, --max NUM (maximum number of pages to spider before stopping)

        -V, --version (prints version number and exits)
The verifier will have the following command line interface requirements:
    % artemis-verify [OPTION]... [FILE]... 

        -v, --verbose (verbose output)

        -V, --version (prints version number and exits)

3.3.3 API Provision Requirements

Must comply with CGI and HTTP 1.1 protocols. see References section.

3.3.4 API Usage Requirements

Must comply with CGI and HTTP 1.1 protocols. see References section.

3.3.5 Diagnostics Requirements

There are no diagnostic requirements for this project.

3.4 Hardware and Communications Interface Requirements

There are no hardware communications interface requirements for this project.

3.5 System Levels of Service Requirements

Level of Service-01

Title: Limbo Limit
Description: An upper limit on the response time for the search engine; the maximum amount of time that the user should be made to wait before being informed of the status of their search
Measurable: By monitoring the frequency of suspended searches, or by merely soliciting feedback from the user.
Achievable: Implementing an internal timer that ensures that some form of response
Relevant: The customer is likely to feel neglected after a certain amount of time and will assume that no information is forthcoming.
Specific: Users would quickly lose patience with the search engine as a whole if they were left without feedback on too many occasions. User must be informed that no matches have been found, but that the search is continuing.

Level of Service-02

Title: Verification Frequency
Description: We will need to specify the length of the interval between database scans
Measurable: Dead links are removed from database after the specified period has elapsed
Achievable: Via a piece of code that monitors the passing of time, and calls the verifier method whenever the set time has passed
Relevant: To the measure of reliability of the database's URLs
Specific: Overly frequent scanning is unnecessary, whereas scanning that is not done frequently enough will not satisfy the customer's reliability requirement

Level of Service-03

Title: System Popularity
Description: The search engine will have to handle the case of multiple users efficiently
Measurable: In testing the time discrepancies in individual searches when other searches are being conducted simultaneously
Achievable: Meticulous file locking to protect the data integrity of each isolated search
Relevant: We do not want any crossing over in the searches of different users.
Specific: To our aim to produce accurate and reliable search results

4 Glossary

API - Application Programming Interface
This is an abstraction of code used to add modules
CGI - Common Gateway Interface
The Common Gateway Interface (CGI) is a simple interface for running external programs, software or gateways under an information server in a platform-independent manner. see references section.
FreeBSD
This is a free UNIX operating system for Intel architecture computers
HTTP - HyperText Transfer Protocol
The Hypertext Transfer Protocol (HTTP) is an application-level protocol for distributed, collaborative, hypermedia information systems. see references section.
Interpreter
A program that translates high level instructions to machine code at runtime.
PERL - Practical Extraction and Report Language
A programming language written by Larry Wall that excels at manipulation of textual data.
Scalability
The ability of the hardware and software used for limited user testing to provide the same functionality to multiple internet users

html document maintained by Anders