Abstract
With the fast growth of information volume through the World Wide Web causes an increasing requirement to develop new automatic system for retrieval of documents and ranking them according to their relevance to the user query. There are many search engines available out there. Most of the search engines are hit based which is page rank based search engines. There are very few ontology based search engines. And these available search engines can’t retrieve good results because of poor implementation. Our proposed system will overcome this drawback as well as it presents a Multilingual Information Retrieval approach that falls into the area of Domain Specific Information Retrieval. This system will use general page rank based search engine API.Results will be processed by web adviser which will implement ontology to annotate documents and create a pr”cised result list by expanding the query. Web Adviser will monitor user’s information to form meaning of query. After this the web pages will be ranked based the semantic similarity between ontological concepts extracted from web pages and ontological concepts represented by the user query.
1.6 PROBLEM STATEMENT
Design a system to extract ontology from unstructured information sources for enhancing user search results.
1.7ABSTRACT
With the fast growth of information volume through the World Wide Web causes an increasing requirement to develop new automatic system for retrieval of documents and ranking them according to their relevance to the user query. There are many search engines available out there. Most of the search engines are hit based which is page rank based search engines. There are very few ontology based search engines. And these available search engines can’t retrieve good results because of poor implementation. Our proposed system will overcome this drawback as well as it presents a Multilingual Information Retrieval approach that falls into the area of Domain Specific Information Retrieval. This system will use general page rank based search engine API.Results will be processed by web adviser which will implement ontology to annotatedocuments and create a pr”cised result list by expanding the query. Web Adviser will monitor user’s information to form meaning of query. After this the web pages will be ranked based the semantic similarity between ontological concepts extracted from web pages and ontological concepts represented by the user query.
1.8 GOALS AND OBJECTIVES
‘ The system provides a solution for machines to process data semantically. We use ontology learning methodologies to semantically model the significant concepts of a query along with its weighted semantic relations to other related concepts.
‘ Interoperability plays the major role in multilingual ontology. Here, the matching methods are important because it requires automatic searching and pattern matching of words of similar pattern or dissimilar pattern.
‘ The main activities are
(i) The user searches document using query in any language.
(ii) The query is analyzed, search result from Wikipedia and loaded in to the system.
(iii) From these results ontology is extracted. The system uses the generated ontology and looks for the translations.Simultaneously the query is converted into English and fired on the Tourism domain and related result are extracted. These results are categorized on basis of generated ontology.
1.9 RELEVANT MATHEMATICS ASSOCIATED WITH THE PROJECT
System Specification:
S= {S, s, X, Y, T, fmain, DD, NDD, ffriend, memory shared, CPUcount}
‘ S (system):- Is our proposed system which includes following tuple.
‘ s (initial state at time T ) :-GUI of search engine. The GUI provides space to enter a query/input for user.
‘ X (input to system) :- Input Query. The user has to first enter the query. The query may be ambiguous or not. The query also represents what user wants to search.
‘ Y (output of system) :- List of URLs with Snippets. User has to enter a query into search engine then search engine generates a result which contains relevant and irrelevant URL’s and their snippets.
‘ T (No. of steps to be performed) :- 6. These are the total number of steps required to process a query and generates results.
‘ fmain(main algorithm) :- It contains Process P. Process P contains Input ,Output and subordinates functions. It shows how the query will be processed into different modules and how the results are generated.
‘ DD (deterministic data):- Data will be fetched from Internet in runtime, users information will be maintained in database. Other than that all data related to serach will be processed in runtime and will be shown to user.
‘ NDD (non-deterministic data):- No. of input queries. In our system, user can enter numbers of queries so that we cannot judge how many queries user enters into single session. Hence, Number of Input queries are our NDD.
‘ ffriend :- WC And IE. In our system, WC and IE are the friend functions of the main functions. Since we will be using both the functions, both are included in ffriend function. WC is Web Crawler which is bot and IE is Information Extraction which is used for extracting information on browser.
‘ Memory shared: – Database. Database will store information like list of receivers, registration details and numbers of receivers. Since it is the only memory shared in our system, we have included it in the memory shared.
‘ CPUcount: – 2. In our system, we require 1 CPU for server and minimum 1 CPU for client. Hence, CPUcount is 2.
Subordinate functions:
‘ Identify the processes as P.
S= {I, O, P}
P= { QA, OP}
Where,
‘ QA is a query analyzer
‘ OP is output processor
‘ P is processes.
‘ QA= {Q, SA, Qr}
Where,
‘ Q =user Query
‘ Semantic analysis will be done on query
‘ Qr is resolved query with relation of query to the domain(Ontological meaning of Query)
‘ OP= {Qr, processing, Info}
Where,
‘ Qr is output of query analysis process
‘ Data related to query will be searched over Internet . Data links will be scored based on relevance to the query
‘ data will be displayed to user
1.10 NAMES OF CONFERENCES / JOURNALS WHERE PAPERS CAN BEPUBLISHED
1]IETE conference ‘ International Conference on Emerging Trends in Engineering and Management Research (ICETEMR-17)
2] IJTIR- International Journal of Emerging Technologies and Innovative Research
1.11 REVIEWOFCONFERENCE/JOURNALPAPERSSUPPORTINGPROJECT IDEA
1.12 PLAN OF PROJECTEXECUTION
Fig 1.1. Plan from June to Nov
CHAPTER 2 TECHNICAL KEYWORDS
2.1 AREA OFPROJECT
Data Mining
2.2 TECHNICALKEYWORDS
‘ Access controls
‘ Authentication
‘ Database processing
‘ Privacy
‘ Security
CHAPTER 3 INTRODUCTION
3.1 PROJECTIDEA
One of the first Multi-Language Information Retrieval (MLIR) systems was implemented in 1969 by Gerard Salton who enhanced his SMART system to retrieve multilingual documents in two languages, English and German. Majority of information retrieval systems are monolingual and more precisely English-based. Our proposed system presents a Multi-Language Information Retrieval (MLIR) approach that falls into the area of Domain Specific Information.
There are many search engines available. The drawback of current conventional web search engines is the knowledge gap between users and computers. The knowledge and work of computer is much more limited than the knowledge of user. Our proposed System use the ontology learning which extracts documents from wikipedia. This methodology is used to semantically model the significant concepts of a query along with its weighted semantic relations to other related concepts. The resulting ontology can be viewed as a benchmark of a topic that can be used to classify or re-rank documents based on the degree of similarity to the original query.
3.2 MOTIVATION OF THEPROJECT
Main motivation of the system is to provide Multilingual search engine to the user. This system analyses users history from database and provide link on the basis of results. System convert any type of language in to English and search result. Results are mainly depends on ontology, provide only prescribed results.
3.3 LITERATURESURVEY
A web search engine is a software system that is designed to search for information on the World Wide Web. The search results are generally presented in a line of results often referred to as search engine results pages(SERPs). The information may be a mix of web pages, images, and other types of files. Some search engines also mine data available in databases or open directories. Unlike web directories, which are maintained only by human editors, search engines also maintain real-time information by running an algorithm on a web crawler.
‘ Search engine maintains the following processes in near real time:
A) Web crawling
‘ Indexing
‘ Searching
Web search engines get their information by web crawling from site to site. The “spider” checks for the standard filename robots.txt, addressed to it, before sending certain information back to be indexed depending on many factors, such as the titles, page content, JavaScript, Cascading Style Sheets (CSS), headings, as evidenced by the standard HTML markup of the informational content, or its metadata in HTML meta tags.
Google is the world’s most popular search engine, with a market share of 71.11 percent as of September, 2016. The world’s most popular search engines (with >1% market share) are:
Google: 71.11%
Bing: 10.56%
Baidu: 8.73%
Yahoo: 7.52%
Currently, most of the organizations working in multilingual environment demand ontologies supporting different natural languages. Consequently, the inclusion of the multilingual information retrieval is not an option but a must. In general, ontology is the study of reality. More specifically, ontology is an expression of a particular model of reality, including a specification of concepts, relationships among concepts, and constraints that exist in the model.
CHAPTER 4
PROBLEM DEFINITION AND SCOPE
4.1 PROBLEMSTATEMENT
Design a system to extract ontology from unstructured information sources for enhancing user search results
4.1.1 Statement of scope
The system provides a solution to process data semantically and uses ontology
learning methodology The system searches documents only in English and Marathi.
.
4.2 MAJOR CONSTRAINTS
The system will able to search any query specific to tourism domain.
4.3 METHODOLOGIES OF PROBLEM SOLVING AND EFFICIENCYIS- SUES
Step 1: User will have to register and login before using this search engine
Step 2: User can search for a place or event using search engine
Step 3: QA: Query Analysis
Step 3.1: Users query will be tokenized
Step 3.2: tokens will be analyzed and a resolved query with relation to the
domain will be generated
Step 3.3: This resolved query will be sent to our data crawler to fetch data related
to topic
Step 4: Data crawler
Step 4.1: data crawler will use different APIs to collect data related to the query
Step 4.2: data related to query will be scored
Step 4.3 : a collection of data or kinks will be created with its score to relevance of topic
Step 5: Answer Page
‘ a)Data will be displayed to user based on scores
‘ b)User will be able to visit source of data for more info or can download documents if available
Step 6: Stop.
4.4 OUTCOME
1. Login and Search page for input query.
2. Extract related documents.
3. Ontology generation.
4. Most relevant document decided by semantic analysis will be displayed to user.
4.5 APPLICATIONS
‘ Aerospace and defense,
‘ Automotive,
‘ Consumer products,
‘ Travel,
‘ Telecommunications
‘ Engineering and construction,
‘ Banking
‘ Health care
4.6 HARDWARE RESOURCESREQUIRED
‘ Processor : Pentium IV or AMD
‘ Hard Disk : 500 GB
‘ RAM : 4 GB
4.7 SOFTWARE RESOURCESREQUIRED
‘ Front End
o User Interface : Java
o Programming Language : Java
o IDE/Workbench : Eclipse Mars 2
‘ Back end
o Database : MySQL
CHAPTER 5 PROJECT PLAN
5.1 PROJECT ESTIMATES
_ Communication :-
1. Requirement Gathering
_ Plan :-
1. Domain Specific Searching
2. using ontology generation methodology
3. Searching Speed analysis
_ Modules :-
1. Key phrase Extraction
2. Search and matching function
3. Cluster ranking and ontology generation
_ Design :-
1. GUI
2. Database Design
3. Front End Design
4. Back End Design
Code :-
1. Construction of Front End
2. Construction of Back End
5.1.1.Time Estimates
Month Goal
July Project Selection, Synopsis, Literature Survey
August SRS document preparation, Presentation of the idea about Project
September Preparation of detailed algorithm. Deciding the software tools and hardware.
October Preparation of 1st semester report, Preparing presentation regarding final layout of the project
November Presentation of 1st semester’s work. Submission of 1st semester report.
December Installation of software.
December-January Coding of the graphical user interface and validation.
January Database created. Testing of the front-end.
February Coding of the main modules.
March Testing of the main modules. Presentation of the completed part of project
April-May Inserting addition modules. Approving the project by guide.
May-June Preparing 2nd semester final Project report. Approving of report by the guide.
June Presentation of the entire project.
5.1.2ProjectResources
_ Human Resources
1. Member 1: Mulla Nilofar
2. Member 2: Pathan Ayesha
3. Member 3:Shahapurkar Namrata
4. Member 4: Tayde Dipali
_ Hardware Resources
1. Processor : Pentium IV or AMD
2.RAM : 4 GB
_ Software
1. Operating System : Windows
2. JDK : jdk 1.7
3. Programming Language : Java 7
4. IDE :- Eclipse
5.1 RISK MANAGEMENT W.R.T. NP HARDANALYSIS
‘ P- Problem: A problem is assigned to the P (polynomial time) class if there exists at least one algorithm to solve that problem, such that the number of steps of the algorithm is bounded by a polynomial in O(n), where n is the size of the input.
‘ NP-Problem: A problem is assigned to the NP (nondeterministic poly-nominal time) class if it is solvable in polynomial time by a nondeterministic Turing machine.
‘ NP-Hard: A problem is said to be NP-hard if an algorithm for solvingit can be translated into one for solving any other NP-problem. Itis much easier to show that a problem is NP than to show that it isNP-hard.
‘ NP-Complete: A problem which is both NP and NP-hard is called anNP-complete problem.
‘ In proposed method, to solve problem we are using following algorithms
– As described system uses preprocessing, tokenization, Keywordextraction and POS tagging so it is not possible to get 100% result.
‘ As system uses different algorithms to solve problem and evaluatingquality, so we can conclude that system is NP-hard.
5.2.1 Risk Identification
For risks identification, review of scope document, requirements specifications and schedule is done. Answers to questionnaire revealed some risks. Each risk is cate- gorized as per the categories mentioned in [?]. Please refer table 5.1 for all the risks. You can refereed following risk identification questionnaire.
1.Have top software and customer managers formally committed to support the project?
2.Are end-users enthusiastically committed to the project and the system/product to be built?
3.Are requirements fully understood by the software engineering team and its customers?
4.Have customers been involved fully in the definition of requirements?
5.Do end-users have realistic expectations?
6. Have customers been involved fully in the definition of requirements?
7. Do end-users have realistic expectations?
5.1.1 RiskAnalysis
The risks for the Project can be analyzed within the constraints of time and quality
ID
Risk Description
Probability Impact
Schedule Quality Overall
1 Description 1 Low Low High High
2 Description 2 Low Low High High
Table 5.1: Risk Table
Probability Value Description
High Probability of occurrence is >75%
Medium Probability of occurrence is 26’75%
Low Probability of occurrence is <25%
Table 5.2: Risk Probability definitions [?]
Impact Value Description
Very high >10% Schedule impact or Unacceptable quality
High 5 ‘10% Schedule impact or Some parts of the project have low quality
Medium <5% ScheduleimpactorBarelynoticeabledegradationinqual- ityLowImpactonscheduleorQualitycanbeincorporated
Table 5.3: Risk Impact definitions [?]
5.1.2 Overview of Risk Mitigation, Monitoring,Management
Following are the details for each risk.
Risk ID 1
Risk Description Description 1
Category Development Environment.
Source Software requirement Specification document.
Probability Low
Impact High
Response Mitigate
Strategy Strategy
Risk Status Occurred
Risk ID 2
Risk Description Description 2
Category Requirements
Source Software Design Specification documentation review.
Probability Low
Impact High
Response Mitigate
Strategy Better testing will resolve this issue.
Risk Status Identified
5.3 PROJECTSCHEDULE
5.3.1 Project taskset
Major Tasks in the Project stages are:
Task 1:- Information Gathering.
Task 2:- Requirement Analysis.
Task 3:- Literature Survey.
Task 4:- Problem Statement Definition.
Task 5:- Define Specification.
Task 6:- Project Planning.
Task 7:- Detail Design.
Task 8:- Model Design Strategy.
Task 9:- Factors Authentication and Identification.
Task 10:- System Analysis And Execution Scenario.
Task 11:- Transaction Database Design.
Task 12:- Risk Analysis.
Task 13:- Software Development.
Task 14:- Testing and QA.
Task 15:- Final Delivery.
5.3.1 Tasknetwork
5.3.2 TimelineChart
Fig 1.1. Plan from June to Nov
Fig 1.2 Plan from Dec to April
5.4 TEAMORGANIZATION
The manner in which staff is organized and the mechanisms for reporting are noted.
5.4.1 Teamstructure
Team consist of four members. Tasks are distributed among the members and are
defined for the proper execution of the project.
1. Mulla Nilofar
2. Pathan Ayesha
3. Shahapurkar Namrata
4. Tayde Dipali
5.4.2 Management reporting andcommunication
Team Leader: Team Leader will divide the task.
Team Developer: Developer will develop the code for execution.
CHAPTER 6
SOFTWARE REQUIREMENT SPECIFICATION
6.1 INTRODUCTION
6.1.1 Purpose and Scope ofDocument
The search results are generally presented in a line of results often referred to as search engine results pages. Main purpose of the system is to provide a search engine which fetches results from Wikipedia and which is able to convert query in English language from any other language.
6.1.2 Overview of responsibilities of Developer
1. Develop technical and functional specifications for projects.
2. Assist in determining time and cost estimates for assigned projects.
3. Develop new applications or make enhancements according to project needs.
4. Utilize programming principles, tools, and techniques to write applicationcodes.
5. Plan, coordinate and execute project activities to ensure timely completion.
6. Ensure project deliverables meet business requirements.
7. Prepare test cases and strategies for unit testing and integration testing.
8. Perform code reviews to identify basic technical and logical errors.
9. Resolve application development issues in a timely manner.
10. Manage project risks, and milestones.
11. Develop best practices to improve productivity.
6.2 USAGE SCENARIO
6.2.1 User profiles
‘ User
‘ Web Portal
6.2.2 Use CaseView
Use Case Diagram. Example is given below
Figure 6.1: Use case diagram
6.4 FUNCTIONAL MODEL AND DESCRIPTION
A description of each major software function, along with dataflow (structured analysis) or class hierarchy (Analysis Class diagram with class description for object oriented system) is presented.
6.4.1 Data Flow Diagram 6.4.1.1Level 0 Data FlowDiagram
6.4.1.2Level 1 Data FlowDiagram
6.4.1.3 Level 2 Data FlowDiagram
6.4.2ActivityDiagram:
6.4.3 Non FunctionalRequirements:
‘ Performance Requirements
To increase the searching speed, dynamically construct the ontology for given query. It requires small space to store dataset as well as it takes less time to extract the ontology related to the topic.
‘ Software Quality Attributes
‘ Portability
Our system shall be 100% portable to all operating platforms that support runtime Environment. Therefore, this software should not depend on the different operating systems.
‘ Usability
Our system shall be easy to use for all users with minimal instructions. 100% of the languages on the graphical user interface (GUI) shall be intuitive and understandable by non-technical users.
‘ Correctness
Our system fulfills all the objectives of users accurately.
‘ Install ability
Our system will be easily installable on most of the machines as long as the basic requirements are met. The application can run on any personal computer.
‘ Maintainability
Our system is maintainable and can satisfy further user requirements provided they can be delivered by the maintenance team. The basic characteristics of our design are software can meet new requirements and cope up with defects to restore it to a specific condition.
‘ Extensibility
Extensibility allows new component to the system, replaces the existing ones. This is done without affecting those components those are in their original place. The software provides great Extensible basically means that you can expand on what is currently built.
6.4.4 StateDiagram:
Figure 6.2: State transition diagram
6.4.5 DesignConstraints
‘ The application is developed on Java technology and Eclipse Mars 2, MySQL and has basic GUI.
6.4.6 Software InterfaceDescription
1. The search box should be larger than the basic search box because people will want to do more elaborate search queries (approximately 30 characters).
2. A “Help” link on how to use the search functionality should be provided. This link should be close to the “Search” button.
3. Allow the user to specify search for “any of the words,” “all of the words,” or “exact phrase.” “Any of the words” should be the default. Either a drop-down select box or radio buttons will suffice.
4. If graphics are used as “action buttons” representing site areas that can be selected, it is crucial that these graphics look “click-able.”
5. Provide clear, direct instructions.Use simple words to explain the process: remove all jargon and technical terms, and make sure that all icons have labels.
CHAPTER 7
DETAILED DESIGN DOCUMENT USING APPENDIX A AND B
7.1 INTRODUCTION
7.2 ARCHITECTURAL DESIGN
Figure 7.1: Architecture diagram
7.2.1 COMPOENT Diagram
Fig 7.4 Component Diagram
7.2.2 Deployment Diagram
Fig 7.4 Deployment Diagram
7.3 TECHNICAL SPECIFICATION
7.3.1 Advantages
1.The system enhance the ranking performance.
2.Useful to enable reuse of domain knowledge.
3.System is versatile and flexible.
7.3.2 Applications
‘ Aerospace and defense,
‘ Automotive,
‘ Consumer products
7.4.1Class Diagram
Figure 7.2: Class Diagram
CHAPTER 8 PROJECT IMPLEMENTATION
8.1 INTRODUCTION
One of the first Multi-Language Information Retrieval (MLIR) systems was implemented in 1969 by Gerard Salton who enhanced his SMART system to retrieve multilingual documents in two languages, English and German. Majority of information retrieval systems are monolingual and more precisely English-based. Our proposed system presents a Multi-Language Information Retrieval (MLIR) approach that falls into the area of Domain Specific Information.
There are many search engines available. The drawback of current conventional web search engines is the knowledge gap between users and computers. The knowledge and work of computer is much more limited than the knowledge of user. Our proposed System use the ontology learning which extracts documents from wikipedia. This methodology is used to semantically model the significant concepts of a query along with its weighted semantic relations to other related concepts. The resulting ontology can be viewed as a benchmark of a topic that can be used to classify or re-rank documents based on the degree of similarity to the original query.
8.2 TOOLS AND TECHNOLOGIESUSED
‘ Java:
About Java: Java has been tested, refined, extended, and proven by a dedicated community of Java developers, architects and enthusiasts. Java is designed to enable development of portable, high-performance applications for the widest range of computing platforms possible. By making applications available across heterogeneous environments, businesses can provide more services and boost end-user productivity, communication, and collaboration’and dramatically reduce the cost of ownership of both enterprise and consumer applications.
‘ Hibernate:
Hibernate ORM(Hibernate in short) is anobject-relational mappingframeworkfor theJavalanguage. It provides a frameworkfor mapping anobject-orienteddomain model to arelational database. Hibernate solvesobject-relational impedance mismatchproblems by replacing direct,persistentdatabase accesses with high-level object handling functions.
Hibernate isfree softwarethat is distributed under theGNU Lesser General Public License2.1. Hibernates primary feature is mapping from Java classes todatabase tables; and mapping from Java data types to SQLdata types. Hibernate also provides data query and retrieval facilities. It generates SQL calls and relieves the developer from manual handling and object conversion of the result set.
8.3 METHODOLOGIES/ALGORITHMDETAILS
8.3.1Algorithm 1/Pseudo Code
Step 1: User will have to register and login before using this search engine
Step 2: User can search for a place or event using search engine
Step 3: QA: Query Analysis
Step 3.1: Users query will be tokenized
Step 3.2: tokens will be analyzed and a resolved query with relation to the domain will be generated
Step 3.3: This resolved query will be sent to our data crawler to fetch data related to topic
Step 4: Data crawler
Step 4.1: data crawler will use different APIs to collect data related to the query
Step 4.2: data related to query will be scored
Step 4.3 : a collection of data or kinks will be created with its score to relevance of topic
Step 5: Answer Page
a)Data will be displayed to user based on scores
b)User will be able to visit source of data for more info or can download documents if available
Step 6: Stop.
8.4 VERIFICATION AND VALIDATION FOR ACCEPTANCE
A development phase has different phase. Verification and validation are performed in each of the phases of the lifecycle.
8.4.1 V V TASKS – PLANNING
Verification of contract
Evaluation of Concept document
Performing risk analysis
8.4.2 V V TASKS – REQUIREMENT PHASE
Evaluation of software requirements
Evaluation / analysis of the interfaces
Generation of systems test plan
Generation of Acceptance test plan
8.4.3 V V TASKS – DESIGN PHASE
Evaluation of software design
Evaluation / Analysis of the Interfaces (UI)
Generation of Integration test plan
Generation of Component test plan
Generation of test design
8.4.4 V V TASKS – IMPLEMENTATION PHASE
Evaluation of source code
Evaluation of documents
Generation of test cases
Generation of test procedure
Execution of Components test cases
8.4.5 V V TASKS – TEST PHASE
Execution of systems test case
Execution of acceptance test case
Updating of traceability metrics
Risk analysis
8.4.6 V V TASKS – INSTALLATION AND CHECKOUT PHASE
Audit of installation and configuration
Final test of the installation candidate build.
Generation of final test report
CHAPTER 9 SOFTWARE TESTING
9.1 TYPE OF TESTINGUSED
The Testing phase forms an important part of the software development life cycle. Any software product has to be tested thoroughly before it is delivered to the end customer. Well tested software with limited features is certainly better than the one having many features with only a few of them working. This document provides a general overview of the testing strategy adopted for testing the project. This document is a procedural guide for listing the testing activities that should be carried out for the Project ‘Ontology Learning from Unstructured Information Sources for User Query’. It describes the software test environment for testing, identifies the tests to be performed, and provides schedules for test activities.
[I]Testing Strategy
This involves testing of individual modules. Here we have tested individual modules written for various operations under Unit Testing.
‘ User Interface Testing
‘ Uploading Document.
9.2 TEST CASES AND TESTRESULTS
[II] Features to Be Tested
Table 1.User Interface Test Cases
Sr. No. Features / Functions to be tested
1. Check for cursor on the search text box in the starting
position
2. Advanced search tab should be available to set searching
3. Search functionality should work when on keyword is
typed and prompt to enter by keyboard.
4. Search functionality should not work when no keyword
is entered and prompt to enter
5. List of links/paths should display matching at least
one of the keyword
6. Most suitable matches should display at the top of
the list
7. Links should open in the same window.
8. Allow user to return to the search again by resetting
or fresh
9. Check the page resolution
Table 2. Upload Document
Sr. No. Features / Functions to be tested
1. Check the search response time
2. Check the total number of results to be displayed in one page
3. Check the URLs’ unchecked are colored BLUE & checked are
Maroon
CHAPTER 10 RESULTS
10.1 SCREENSHOTS
Outputs / Snap shots of the results
10.2 OUTPUTS
Outputs / Snap shots of the results
CHAPTER 11 DEPLOYMENT ANDMAINTENANCE
11.1INSTALLATION ANDUN-INSTALLATION
11.1.1 Java
1. Download JDK from http://java.sun.com
2. Go to where to downloaded JDK
3. Launch the installer. If given security warning click the Run button.
4. The installer should begin loading.
5. End User License Agreement (EULA) is presented. After read it clicks on the
”Accept” button .
6. Click on default installation and click Next button.
7. After it will be asked to install the Java Runtime Engine(JRE). Click the ”Next”
Bu tton
8. Now click on finish button.
11.1.2 Eclipse
1. To use Eclipse for Java programming, you need to first install Java Development
Kit (JDK). Read ‘How to Install JDK for Windows’.
2. Download Eclipse from https://www.eclipse.org/downloads. Under ‘Get Eclipse
Neon’ Click ‘Download Packages’. For beginners, choose the 3rd entry
‘Eclipse IDE for Java Developers’ (32-bit or 64-bit) (e.g., ‘eclipse-java-neon-
2-win32-x86 64.zip’ 161MB) Download.
3. To install Eclipse, simply unzip the downloaded file into a directory of your
choice (e.g., ‘d:’).
There is no need to run any installer. Moreover, you can simply delete the
entire Eclipse directory when it is no longer needed (without running any uninstaller).
You are free to move or rename the directory. You can install (unzip)
multiple copies of Eclipse in the same machine.
11.2USER HELP
The software has provided a help file in the system for the users to understand and
easily use the software. It provides the information for the usage of the software
which is useful for proceeding with the software.
CHAPTER 12 CONCLUSION AND FUTURESCOPE
12.1CONCLUSION
The proposed system automates the generation of ontology by extracting semantic relationships between concepts from unstructured information sources.The system is fully unsupervised as it requires neither training , nor user annotation.The system searches documents only in English & Marathi.The main objective is to allowing highly relevant pages to a query to be placed on the top positions of search results returned by a search engine.
12.2 FUTURE SCOPE
Thus it is reliable method to be used for ranking Web pages of different domains.
ANNEXURE A REFERENCES
[1] [1] Elizabeth Liddy. ‘Enhanced text retrieval using natural language processing’. Bulletin of the American Society for Information Science, 24, pp. 14-16, 1998.
[2] Philipp Cimiano. Ontology Learning and Population from Text Algorithms, Evaluation and Applications. Springer. 2006.
[3] L. Ding, T. Finin, A. Joshi, R. Pan, R. Cost, Y. Peng, et al., ‘Swoogle: a search and metadata engine for the semantic Web’, in Proceedings of the 13th ACM Conference Information and Knowledge Management, ACM Press, New York, USA, pp. 652’659, 2004.
[4] Aliaa A.A. Youssif, Atef Z. Ghalwash, and Eslam Amer. ‘KPE: An Automatic Keyphrase Extraction Algorithm’, IEEE proceeding of International Conference on Information Systems and Computational Intelligence (ICISCI 2011), pp. 103 -107, 2011.
[5] Chintan Patel, Kaustubh Supekar, Yugyung Lee, E. K. Park,’ OntoKhoj: a semantic Web portal for ontology searching, ranking and classification’, Proceedings of the 5th ACM international workshop on Web information and data management, pp. 58-61,2003.
ANNEXURE B
LABORATORY ASSIGNMENTS ON PROJECT ANALYSIS OF ALGORITHMIC DESIGN
‘ To develop the problem under consideration and justify feasibility using concept of knowledge canvas and idea matrix.
Theory :-
Innovation depends on ideas generated through creativity, knowledgeand research that make it possible to put the ideas to work. However thistwo activities are very dependent on the people who perform them. The articlethat must be developed is a knowledge and idea management system. Creativityplays an important role in innovation process as it generates the ideas thatwill initiate innovation. Ideas emerges at every level of the process and theycorresponds to various challenges such as responding to an issue, meeting atarget objective, solving the problem, making use of knowledge. IDEA matrixhelp us to understand each and every stage of the project improvement anddevelopment. Thus we can make the IDEA matrix as follows :
‘ Problem statement feasibility assessment using NP-hard/NP-complete or satisfiability issues using modern algebra and/or mathematical model.
Theory :-
What is P?
P is set of all decision problems which can be solved in polynomial time by a deterministic.Since it can be solved in polynomial time, it can be verified in polynomial time.
Therefore P is a subset of NP.
What is N?
‘N’ in ‘NP’ refers to the fact that you are not bound by the normal way acomputer works, which is step-by-step. The ‘N’ actually stands for’Non-deterministic’. This means that you are dealing with an amazing kindof computer that can run things simultaneously or could somehow guess theright way to do things, or something like that.So this ‘N’ computer can solve lots more problems in ‘P’ time – for exampleit can just clone copies of itself when needed.So, programs that take dramatically longer as the problem gets harder (ie notin ‘P’) could be solved quickly on this amazing ‘N’ computer and so are in’NP’. Thus ‘NP’ means ‘we can solve it in polynomial time if we can break thenormal rules of step-by-step computing’.
What is NP?
‘NP’ means ‘we can solve it in polynomial time if we can break the normal rules of step-by-step computing’.What is NP-Complete? Since this amazing ‘N’ computer can also do anything a normal computer can,we know that ‘P’ problems are also in ‘NP’.So, the easy problems are in ‘P’ (and ‘NP’), but the really hard ones are*only* in ‘NP’, and they are called ‘NP-complete’.It is like saying there are things that People can do (‘P’), there are things that
Super People can do (‘SP’), and there are things *only* Super People can do(‘SP-complete’).
What is NP Hard?
A problem is NP-hard if an algorithm for solving it can be translated into one forsolving any NP-problem (non-deterministic polynomial time) problem. NP-hardtherefore means ‘at least as hard as any NP-problem,’ although it might, in fact,be harder.
Economic Feasibility:
Economic analysis is the most frequently used techniquefor evaluating the effectiveness of a proposed system. More commonly knownas Cost / Benefit analysis, the procedure is to determine the benefits and savingsthat are expected from a proposed system and compare them with costs. If benefitsoutweigh costs, a decision is taken to design and implement the system. Otherwise,further justification or alternative in the proposed system will have to be made if itis to have a chance of being approved. This is an outgoing effort that improves inaccuracy at each phase of the system life cycle.
Operational Feasibility:
This is mainly related to human organizational aspects.The points to be consider As most network packet forwarding systems arehidden from end user so direct interaction of end use are not concern with this systembut they will get better service over the network. This feasibility study is carriedout by a small group of people who are familiar with information system techniqueand are skilled in system analysis and design process.
Conclusion:
Hence we have successfully stated feasibility assessment using NPhard/NP-complete and mathematical model.
ANNEXURE C
LABORATORY ASSIGNMENTS ON PROJECT QUALITY AND RELIABILITY TESTING OF PROJECT DESIGN
It should include assignments such as
‘ Use of divide and conquer strategies to exploit distributed/parallel/concurrent processing of the above to identify object, morphisms, overloading in func- tions (if any), and functional relations and any other dependencies (as per re- quirements). It can include Venn diagram, state diagram, function relations, i/orelations;usethistoderiveobjects,morphism,overloading
‘ Use of above to draw functional dependency graphs and relevant Software modeling methods, techniques including UML diagrams or other necessities using appropriatetools.
‘ Testing of project problem statement using generated test data (using mathe- matical models, GUI, Function testing principles, if any) selection andappro- priateuseoftestingtools,testingofUMLdiagram’sreliability.Writealsotest cases [Black box testing] for each identified functions. You can use Mathe- matica or equivalent open source tool for generating testdata.
‘ Additional assignments by the guide. If project type as Entreprenaur, Refer [?],[?],[?],[?]
ANNEXURE D PROJECT PLANNER
Using planner or alike project management tool.
ANNEXURE E
REVIEWERS COMMENTS OF PAPER SUBMITTED
(At-least one technical paper must be submitted in Term-I on the project design in the conferences/workshops in IITs, Central Universities or UoP Conferences or equivalentInternationalConferencesSponsoredbyIEEE/ACM)
1. PaperTitle:
2. Name of the Conference/Journal where paper submitted: 3.Paper accepted/rejected:
4.Review comments by reviewer : 5.Corrective actions if any :
ANNEXURE F PLAGIARISM REPORT
Plagiarism report
ANNEXURE G
TERM-II PROJECT LABORATORY ASSIGNMENTS
1. Review of design and necessary corrective actions taking into consideration the feedback report of Term I assessment, and othercompetitions/conferences participated like IIT, Central Universities, University Conferences or equiva- lent centers of excellenceetc.
2. Project workstation selection, installations along with setup and installation reportpreparations.
3. Programming of the project functions, interfaces and GUI (if any) as per 1 stTerm term-work submission using corrective actions recommended in Term-I assessment ofTerm-work.
4. Test tool selection and testing of various test cases for the project performed andgeneratevarioustestingresultcharts,graphsetc.includingreliabilitytest- ing.
Additional assignments for the Entrepreneurship Project:
5. Installations and Reliability Testing Reports at the clientend.
ANNEXURE H
INFORMATION OF PROJECT GROUP MEMBERS
one page for each student .
1. Name:
2. Date of Birth : 3.Gender : 4.Permanent Address : 5.E-Mail : 6.Mobile/Contact No. : 7.Placement Details : 8.Paper Published:
Essay: Design a system to extract ontology from unstructured information sources for enhancing user search results
Essay details and download:
- Subject area(s): Computer science essays
- Reading time: 21 minutes
- Price: Free download
- Published: 17 August 2019*
- Last Modified: 29 September 2024
- File format: Text
- Words: 2,394 (approx)
- Number of pages: 10 (approx)
Text preview of this essay:
This page of the essay has 2,394 words.
About this essay:
If you use part of this page in your own work, you need to provide a citation, as follows:
Essay Sauce, Design a system to extract ontology from unstructured information sources for enhancing user search results. Available from:<https://www.essaysauce.com/computer-science-essays/design-a-system/> [Accessed 20-11-24].
These Computer science essays have been submitted to us by students in order to help you with your studies.
* This essay may have been previously published on EssaySauce.com and/or Essay.uk.com at an earlier date than indicated.