Intelligent Analysis Of Weblog Mining

A PRELIMINARY PROJECT REPORT ON Intelligent Analysis Of Weblog Mining
In this project, we are going to investigate a new way to search the most evident co-clusters of users and the corresponding web pages in the web log dataset using frequent super-sequence mining technique.
Through experiments It is important to mine the weblog dataset to find interesting and helpful information. There are three kinds of mining on weblog data which are web usage mining, web structure mining and web content mining. In our research, we are going to investigate web pages structure and find the most evident groups of users and web pages. Nowadays, big data is everywhere. Facing huge amount of web logs, it is not always necessary to group all the users in a web log dataset into different clusters, sometimes, finding out the major dominant user groups and the corresponding web pages is more important. we find interesting results.
INDEX
1 Synopsis 1
1.1 Project Title . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Project Option . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Internal Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Technical Keywords . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.5 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.6 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.7 Goals and Objectives . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.8 Relevant mathematics associated with the Project . . . . . . . . . . 3
1.9 Review of Conference/Journal Papers supporting Project idea . . . . 4
2 Technical Keywords 5
2.1 Area of Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Technical Keywords . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3 Introduction 8
3.1 Project Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Literature Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4 Problem Definition and scope 12
4.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.1.1 Goals and objectives . . . . . . . . . . . . . . . . . . . . . 13
4.2 Software context . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.3 Major Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.4 Methodologies of Problem solving and efficiency issues . . . . . . . 13
4.5 Outcome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.6 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.7 Hardware Resources Required . . . . . . . . . . . . . . . . . . . . 14
4.8 Software Resources Required . . . . . . . . . . . . . . . . . . . . . 14
5 Project Plan 15
5.1 Project Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5.1.1 Reconciled Estimates . . . . . . . . . . . . . . . . . . . . . 16
5.1.2 Project Resources . . . . . . . . . . . . . . . . . . . . . . . 17
5.2 Risk Management w.r.t. NP Hard analysis . . . . . . . . . . . . . . 18
5.2.1 Risk Identification . . . . . . . . . . . . . . . . . . . . . . 18
5.2.2 Risk Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 18
5.2.3 Overview of Risk Mitigation, Monitoring, Management . . 19
5.3 Project Schedule . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5.3.1 Project task set . . . . . . . . . . . . . . . . . . . . . . . . 20
5.3.2 Task network . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.4 Project Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.4.1 Timeline Chart . . . . . . . . . . . . . . . . . . . . . . . . 21
5.5 Team Organization . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.5.1 Team structure . . . . . . . . . . . . . . . . . . . . . . . . 22
5.5.2 Management reporting and communication . . . . . . . . . 22
6 Software requirement specification (SRS is to be prepared using relevant
mathematics derived and software engg. Indicators in Annex A and B) 23
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
6.1.1 Purpose and Scope of Document . . . . . . . . . . . . . . . 24
6.1.2 Overview of responsibilities of Developer . . . . . . . . . . 24
6.2 Usage Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
6.2.1 User profiles . . . . . . . . . . . . . . . . . . . . . . . . . 25
6.2.2 Use-cases . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
6.2.3 Use Case View . . . . . . . . . . . . . . . . . . . . . . . . 26
6.3 Data Model and Description . . . . . . . . . . . . . . . . . . . . . 26
AISSMS IOIT, Department of Computer Engineering 2015 IV
6.3.1 Data Description 26
6.3.2 Data Flow Diagram 27
6.3.3 Activity Diagram 28
6.3.4 Non Functional Requirements: 29
6.3.5 Software Interface Description 30
7 Detailed Design Document using Appendix A and B 31
7.1 Introduction 32
7.2 Architectural Design 32
7.2.1 Internal software data structure 32
7.2.2 Global data structure 33
7.2.3 Temporary data structure 33
7.2.4 Database description 33
7.3 Compoent Design 33
7.3.1 Deployment Diagram 34
7.3.2 Class Diagram 35
8 Summary and Conclusion 36
Annexure A Laboratory assignments on Project Analysis of Algorithmic
Design 38
A.1 Canvas Diagram 39
A.2 Problem Description: 40
Annexure B Laboratory assignments on Project Quality and Reliability
Testing of Project Design 41
B.1 Testing 42
9 Project Planner 43
AISSMS IOIT, Department of Computer Engineering 2015 V
List of Figures
5.1 Waterfall Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5.2 Task Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.3 Timeline Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
6.1 Use case diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
6.2 Data Flow Level-0 . . . . . . . . . . . . . . . . . . . . . . . . . . 27
6.3 Data Flow Level-1 . . . . . . . . . . . . . . . . . . . . . . . . . . 27
6.4 Data Flow Level-2 . . . . . . . . . . . . . . . . . . . . . . . . . . 28
6.5 Activity diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
7.1 Architecture diagram . . . . . . . . . . . . . . . . . . . . . . . . . 32
7.2 Deployment Diagram . . . . . . . . . . . . . . . . . . . . . . . . . 34
7.3 Class Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
A.1 Canvas Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
A.2 Idea Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
A.3 Problem Description . . . . . . . . . . . . . . . . . . . . . . . . . 40
B.1 Blackbox Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
List of Tables
4.1 Hardware Requirements . . . . . . . . . . . . . . . . . . . . . . . 14
5.1 Risk Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5.2 Risk Probability definitions [?] . . . . . . . . . . . . . . . . . . . . 19
5.3 Risk Impact definitions [?] . . . . . . . . . . . . . . . . . . . . . . 19
6.1 Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
A.1 IDEA Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
CHAPTER 1
SYNOPSIS
1.1 PROJECT TITLE
Intelligent Analysis of Web Log Mining
1.2 PROJECT OPTION
Internal Project
1.3 INTERNAL GUIDE
Prof. Mrs. S.N.Zaware
1.4 TECHNICAL KEYWORDS
Super-sequence mining, Pattern mining,Web log dataset, Co-Clustering,Recommendation,Pattern oriented partial clustering
1.5 PROBLEM STATEMENT
The problem is to find the most evident groups of users and the corresponding web pages simultaneously which is to group the sequences and features into partial clusters simultaneously.
Given a database D = s1, s2, …, sk of data sequences, and a set P = p1, p2, …, pm of frequent patterns in D, the problem is to divide P into a set of n clusters c1, c2, , cn.
1.6 ABSTRACT
In this project, we are going to investigate a new way to search the most evident co-clusters of users and the corresponding web pages in the web log dataset using frequent super-sequence mining technique.In our project we are going to do mining on the Weblog and co-cluster the log so as to get a good recommendation and report generation for business analysis. In our project we are creating a web site and from that we are capturing live weblog , using this web logs we are doing co-clustering on the basis of similarity between objects. Through this project it is important to mine the weblog dataset to nd interesting and helpful information. There are three kinds of mining on weblog data which are web usage mining, web structure mining and web content mining.
In our research,we are going to investigate web pages structure and find the most evident groups of users and web pages. Nowadays, big data is everywhere. Facing huge amount of web logs, it is not always necessary to group all the users in a web log dataset into different clusters, sometimes, finding out the major dominant user groups and the corresponding web pages is more important. we find interesting
results.
1.7 GOALS AND OBJECTIVES
”’ The main objective of Web log mining is to extract interesting patterns from the Web access to records. Assist users or site owners in finding something
useful.
1.8 RELEVANT MATHEMATICS ASSOCIATED WITH THE PROJECT
System Description:
”’ Input: Website WebLog
”’ Output: Clusters
”’ Data Structures: Schema, divide and conquer strategies to exploit parallel processing,constraints.
”’ Functions : Clustering
”’ Success Conditions: Clustering of the data
”’ Failure Conditions: Data size is too large to manage
1.9 REVIEW OF CONFERENCE/JOURNAL PAPERS SUPPORTING PROJECT IDEA
1. Weblog Mining, Privacy Issues and Application of Web Log Mining[2014]
2. Analysis Of Web Logs And Web User In Web Mining(2011)
3. An implementation of Web Log Mining(2014)
4. Frequent Pattern Mining in Web Log Data(2006)
5. Multi-Way Distributional Clustering via Pairwise Interactions[2005]
6. Unifying Dependent Clustering and Disparate Clustering for Non-homogeneous
Data[2010]
CHAPTER 2
TECHNICAL KEYWORDS
2.1 AREA OF PROJECT
Web Mining,Data Mining
2.2 TECHNICAL KEYWORDS
Super-sequence mining, Pattern mining,Web log dataset, Co-Clustering,Recommendation,Pattern oriented partial clustering
1. C. Computer Systems Organization
(a) C.2 COMPUTER-COMMUNICATION NETWORKS
i. C.2.4 Distributed Systems
A. Client/server ii. C.2.5 Local and Wide-Area Networks
A. Ethernet (e.g., CSMA/CD)
B. Internet (e.g., TCP/IP)
(b) C.4 PERFORMANCE OF SYSTEMS
i. Fault tolerance ii. Reliability, availability, and serviceability
(c) C.5 COMPUTER SYSTEM IMPLEMENTATION
i. C.5.3 Computers
A. Personal computers
B. Portable devices (e.g., laptops, personal digital assistants) ii. C.5.5 Servers
2. D. Software
(a) D.1 PROGRAMMING TECHNIQUES
i. D.1.5 Object-oriented Programming
(b) D.2 SOFTWARE ENGINEERING (K.6.3)
i. D.2.1 Requirements/Specifications (D.3.1)
AISSMS IOIT, Department of Computer Engineering 2015 6
A. Methodologies (e.g., object-oriented, structured)
B. Tools ii. D.2.5 Testing and Debugging
A. Testing tools (e.g., data generators, coverage testing)
(c) D.4 OPERATING SYSTEMS
i. D.4.6 Security and Protection
A. Authentication
B. Verification
3. E. Data
(a) E.1 DATA STRUCTURES
i. Tables
4. H. Information Systems
(a) H.2 DATABASE MANAGEMENT
i. H.2.0 General
A. Security, integrity, and protection ii. H.2.3 Languages
A. Data description languages (DDL)
B. Data manipulation languages (DML)
C. Query languages
CHAPTER 3
INTRODUCTION
3.1 PROJECT IDEA
”’ The problem of web log mining consists in automated analyzing of web access logs in order to discover trends and regularities (patterns) in users”’ behavior. The discovered patterns are usually used for improvement of web site organization and presentation. The term of adaptive web sites has been proposed to denote such automatically transformed web sites .
”’ One of the most interesting web log mining methods is web users clustering . The problem of web users clustering (or segmentation) is to use web access log files to partition a set of users into clusters such that the users within a cluster are more similar to each other than users from different clusters. The discovered clusters can then help in on-the-fly transformation of the web site content. In particular, web pages can be automatically linked by artificial hyperlinks. The idea is to try to match an active user”’s access pattern with one or more of the clusters discovered from the web log files. Pages in the matched clusters that have not been explored by the user may serve as navigational hints for the user to follow.
3.2 LITERATURE SURVEY
Weblog Mining, Privacy Issues and Application of Web Log Mining[2014]
Authors:Amarjeet Singh Yumnam, Y. Chaitanya Sreeram, Shaik Abdul Naeem
In this paper a detailed overview of weblog mining is explained . It shows how data set can be prepared using web content , structure and usage mining. Then it gives clear explanation of data preprocessing starting from removal of ads ,images , failed http status code ,etc.After cleaning it performs collection of web object containing urls ,session user ,duration ,etc. Then for missed records due to proxy server and caching problem three approaches are suggested . Once data is captured and stored , it is formatted .Finally patterns are found based on web user using web which are analyzed maintaining privacy and mining is done for recommendation
Analysis Of Web Logs And Web User In Web Mining(2011)
Authors:L.K.Joshila Grace, V. Maheshwari, Dhinaharan Nagamalai
This paper gives an idea about contents log files and where the log files actually resides. Also provide the information about the status code server send to its user/client. Paper provide an overview of web mining and web usage mining. By using log file how we can perform the web usage mining. What are the steps present in the web usage mining. It gives the idea of creating the extended log file and find out the user
interest
An implementation of Web Log Mining(2014)
Authors:Bhaiyalal Birla and Sachin Patel
Pattern mining algorithm for evaluation and implementation of frequent Pattern analysis from the web data using Apriori algorithm and for improve performance we proposed a new modified Apriori algorithm. The comparison of memory and time usage is compared between apriori and modified apriori algorithm.
Frequent Pattern Mining in Web Log Data(2006)
Author:Renata Ivancsy and Istvan Vajk
Frequent pattern analysis for Discovery of hidden information from web log data. Obtain the navigational behavior of the user. We deal with the problem of discovering of hidden information from large amount of web log data collected by a server
Multi-Way Distributional Clustering via Pairwise Interactions[2005]
Author:Ron Bekkerman, Ran El-Yaniv, Andrew McCallum.
In this paper tell about multiway clustering based on two components first is extension of information i.e pairwise interaction and second is clustering algorithm ,which can be viewed as schedule mixture among serval clustering direction. clustering provide in the both the way top-down and bottom-up. This is efficient approach over two-way clustering as it provide simultaneous interaction clustering. As it provide multi-dimensional ,multivariate , distributional clustering.
Unifying Dependent Clustering and Disparate Clustering for Non-homogeneous
Data[2010]
Author: M. Shahriar Hossain, Satish Tadepalli, Layne . Watson, Ian Davidson, Richard F. Helm, Naren Ramakrishnan.
In this paper clustering is perform on some non-homogeneous it uses two types of clustering 1.Dependent clustering and 2. Disparate clustering. This two types of clustering achieve using some optimization framework .It work on both synthetic as well as real world data It uses boolean criteria that that allow constraints to be satisfied or violated in a smooth manner. Multivariate is bottleneck.
Crime Analysis using K-Means Clustering[2013]
Author:Jyoti Agarwal , Renuka Nagpal , Rajni Sehgal
In this paper the authors have performed crime analysis using K-means which is partitioning observations into k clusters. Each cluster has observations that are more similar than those in other clusters using nearest mean .Its disadvantages are it is applicable only when mean is defined and it requires to specify number of clusters in advance. Also it is not suitable to handle noisy data and cannot discover clusters with non-convex shape.
CHAPTER 4
PROBLEM DEFINITION AND SCOPE
4.1 PROBLEM STATEMENT
The problem is to find the most evident groups of users and the corresponding web pages simultaneously which is to group the sequences and features into partial clusters simultaneously. Given a database D = s1, s2, …, sk of data sequences, and a set P = p1, p2, …, pm of frequent patterns in D, the problem is to divide P into a set of n clusters c1, c2, , cn.
4.1.1 Goals and objectives
Goal and Objectives:
”’ The main objective of Web log mining is to extract interesting patterns from the Web access to records.Assist users or site owners in finding something
useful.
4.2 SOFTWARE CONTEXT
”’ The application useful in business Analysis and recommendation purpose.
4.3 MAJOR CONSTRAINTS
”’ Network failure
”’ Database failure
”’ Network Breach
4.4 METHODOLOGIES OF PROBLEM SOLVING AND EFFICIENCY IS-
SUES
Greedy approach, Divide and Conquer approach, etc
4.5 OUTCOME
”’ On the bases of clustering analysis, recommendation provide.
4.6 APPLICATIONS
”’ Business Analysis
”’ Good Recommendation
4.7 HARDWARE RESOURCES REQUIRED
Parameter Min Requirement Justification
Server CPU Speed 10 GHz Large number of runtime request to be processed.
Server RAM 128 GB Large database of user information.
I/O Operations/day 850K Increases capability to handle large no. of operation
Client CPU Speed 1.8 GHz For normal processing
Client CPU RAM 2 GB For normal processing
Table 4.1: Hardware Requirements
4.8 SOFTWARE RESOURCES REQUIRED
Platform :
1. Operating System: Windows7,8
2. Front End : JAVA
3. IDE: Netbeans 8.0
4. Back End : My SQL
CHAPTER 5
PROJECT PLAN
Requirement Total Cost
Hardware-processor i3 , i5 Rs.15,000
Travel cost for meeting purposes Rs.1000
Telecommunications Rs.500
Others(library use) Rs.1000
5.1 PROJECT ESTIMATES
5.1.1 Reconciled Estimates
5.1.1.1 Cost Estimate
Hardware And Software Purchases:
”’ processor
”’ Server-1
”’ Client-3
”’ Software”’s- Mysql, Java(for website) ,NetBeans.
Lines of Code:
Function Estimated LOC
User Interfaces and Control Facilities(UICF) 2,300
Database Management(DBM) 3,435
Computer Graphics Display Facilities(CGDF) 4,955
Peripheral Control Function(PCF) 2,152
Design Analysis Modules(DAM) 4,355
Estimated Lines Of Code 17197
5.1.1.2 Time Estimates
A schedule estimation rule of thumb [McConnell 1996]
1. Estimate coding and unit testing for each use case.For a series of use cases that contains four web pages, two interfaces, five database tables and no data conversions, the estimate are: (4 x 1.5) + (2 x 3) + (5 x 1) + (0 x 1) = 17 man-weeks or 680 hours.
2. Multiply the above result from 2.5 so we account for analysis and testing of each use case. So doing the math, we would now have 1700 hours as a project
estimate.
3. Adding 20 percent to the for each additional developer more than one. The base estimate assumes only one developer. By adding more developers, there is an inherent overhead for communication and coordination between team members. Thus we have 4 developers so we need to inflate the estimate by 60 percent (three additional developers). Multiply the base estimate by 60 percent and add that number to the base estimate like this: 0.6 x 1700 = 1020 hours
1020+ 1700 = 2720 total hours
5.1.2 Project Resources
Project resources 4 peoples, Hardware-i3 processor(minimum hardware requirements),
IDE , Database and Concurrency
5.2 RISK MANAGEMENT W.R.T. NP HARD ANALYSIS
5.2.1 Risk Identification
Dataset is too large it will not manage all the data efficiently. WebDataset is prerequirement for this project.Server failure can cause serious damage as the website will not be accessed.Due to malfunctioning such as hanging of server can cause weblogs to be generated with wrong data .The customer managers are committed to the project fully as the project can improve their business and there by customer satis-
factions.
Though requirements are fully understood by the software engineering team, changes from customers may require changes in resources which may increase cost and time.Subsequently,though customers are involved fully in definition of requirements ,there is great chance that during implementation their requirements are covered but methodology may change.
Software engineering team has right mix of skills.Changes in implementation tools may require to acquire the knowledge of new tools. Projects requirements are initially stable but suggestions made by the customers can change requirements.The number of members on project team are adequate to do the job but if the time estimates changes then more members can be required.All customers and end-users agree on the importance of project as long as the project benefits them.
5.2.2 Risk Analysis
The risks for the Project can be analyzed within the constraints of time and quality
ID Risk Description Probability Impact
Schedule Quality Overall
1 Description 1 Low Low High High
2 Description 2 Low Low High High
Table 5.1: Risk Table
Probability Value Description
High Probability of occurrence is > 75%
Medium Probability of occurrence is 26”’75%
Low Probability of occurrence is < 25% Table 5.2: Risk Probability definitions [?] Impact Value Description Very high > 10% Schedule impact or Unacceptable quality
High 5”’10% Schedule impact or Some parts of the project have low quality
Medium < 5% Schedule impact or Barely noticeable degradation in quality Low Impact on schedule or Quality can be incorporated Table 5.3: Risk Impact definitions [?] 5.2.3 Overview of Risk Mitigation, Monitoring, Management Following are the details for each risk. Risk ID 1 Risk Description Server Failure Category Requirements Source Software requirement Specification document. Probability Low Impact High Response Mitigate Strategy Increase in size of data storage Risk Status Identified Risk ID 2 Risk Description Server is hanged Category Development Environment Requirements Source Software Design Specification documentation review. Probability Low Impact High Response Mitigate Strategy Better testing will resolve this issue. Risk Status Identified Risk ID 3 Risk Description Database cannot handle more data Category Technology Source This was identified during early development and testing. Probability Low Impact Very High Response Accept Strategy Should switch to BigData Risk Status Identified 5.3 PROJECT SCHEDULE 5.3.1 Project task set Major Tasks in the Project stages are: ''' Task 1:Designing of Website. ''' Task 2:Database Connectivity. ''' Task 3:Capturing Weblogs efficiently. ''' Task 4:Clustering and making data available for analysis by Administrator. 5.3.2 Task network 5.4 PROJECT ESTIMATES 5.4.1 Timeline Chart A project timeline chart is presented. This may include a time line for the entire project. Figure 5.3: Timeline Chart 5.5 TEAM ORGANIZATION 5.5.1 Team structure ''' Sheetal Malpathak:Coding ''' Suffiyana Shiledar:Testing. ''' Alsaba Shaikh:Documentation. ''' Neha Harpale:Design. 5.5.2 Management reporting and communication Mechanisms for progress reporting and inter/intra team communication are identified as per assessment sheet and lab time table. CHAPTER 6 SOFTWARE REQUIREMENT SPECIFICATION (SRS IS TO BE PREPARED USING RELEVANT MATHEMATICS DERIVED AND SOFTWARE ENGG. INDICATORS IN ANNEX A AND B) 6.1 INTRODUCTION 6.1.1 Purpose and Scope of Document A software requirements specification (SRS) is a description of a software system to be developed, laying out functional and non-functional requirements, and include a set of use cases that describe interactions the users will have with the software 6.1.2 Overview of responsibilities of Developer Activities carried out by developer ''' Evaluate, assess and recommend software and hardware solutions. ''' Develop software, architecture, specifications and technical interfaces. ''' Develop user interfaces and client displays. ''' Design, initiate and handle technical designs and complex application features. ''' Build flexible data models and seamless integration points. ''' Innovate and develop high-value technology solutions to streamline processes. ''' Initiate and drive major changes in programs, procedures and methodology. 6.2 USAGE SCENARIO There are two Scenario ''' We are developing our own site and taking a log file from server.First clustering is perform on that weblogs again co-cluster and on the basis of clustering generating a report or recommendation ''' we are taking ready-made logs file of server and then perform clustering, coclustering on the basis of clustering generating a report or recommendation 6.2.1 User profiles The profiles of all user categories are described here.(Actors and their Description) There are different types of users- ''' Users for particular website- Users getting recommendation for particular products on that site. ''' Business Analyst- Product analysts are important members of a product development team, providing other members with information on the business and market requirements for a product. They use their skills to identify customers product requirements and ensure that the team takes account of the customer perspective throughout the product development process 6.2.2 Use-cases In software and systems engineering, a use case is a list of action or event steps, typically defining the interactions between a role (known in the Unified Modeling Language as an actor) and a system, to achieve a goal. The actor is user and business analyst Sr No. Use Case Description Actors Assumptions 1 Use Case 1 Admin has all rights Admin On the basis of clustering analysis report generated Table 6.1: Use Cases 6.2.3 Use Case View Figure 6.1: Use case diagram 6.3 DATA MODEL AND DESCRIPTION 6.3.1 Data Description A data object is a representation of some structured data.Web-log is data set is input to a system. Web servers maintain log files listing every request made to the server. 6.3.2 Data Flow Diagram 6.3.2.3 Level 2 Data Flow Diagram 6.3.3 Activity Diagram Activity Diagrams 6.3.4 Non Functional Requirements: ''' Interface Requirements-: Website(WordPress) ,Web browser,Bookmarklet ,Javascript. ''' Performance Requirements-: For the performance to be high the internet with the good speed must be available. The storage capacity must be large enough.Redundant free database. ''' Software quality attributes-: The service will be available all the time when the internet in present. 6.3.5 Software Interface Description ''' The interface for accessing relational databases from Java is Java Database Connectivity (JDBC). Via JDBC you create a connection to the database, issue database queries and update as well as receive the results.JDBC provides an interface which allows you to perform SQL operations independently of the instance of the used database. To use JDBC, you require the database specific implementation of the JDBC driver. ''' Internet connectivity is required as we are collecting weblogs from web-server CHAPTER 7 DETAILED DESIGN DOCUMENT USING APPENDIX A AND B 7.1 INTRODUCTION This document specifies the design that is used to solve the problem of Product. 7.2 ARCHITECTURAL DESIGN NetBeans is a software development platform written in Java. The NetBeans Platform allows applications to be developed from a set of modular software components called modules. Applications based on the NetBeans Platform, including the NetBeans integrated development environment (IDE), can be extended by third party developers. The NetBeans IDE is primarily intended for development in Java, but also supports other languages, in particular PHP, C/C++ and HTML5. NetBeans is cross-platform and runs on Microsoft Windows, Mac OS X, Linux, Solaris and other platforms supporting a compatible JVM 7.2.1 Internal software data structure Web-log Dataset 7.2.2 Global data structure Analyze Data 7.2.3 Temporary data structure Processing items. 7.2.4 Database description MySQL Database 7.3 COMPOENT DESIGN Class diagrams,Deployment Diagram''' 7.3.1 Deployment Diagram Figure 7.2: Deployment Diagram 7.3.2 Class Diagram Figure 7.3: Class Diagram CHAPTER 8 SUMMARY AND CONCLUSION Conclusions and Future Work: We considered the problem of clustering web access sequences. Due to the limitations of the existing clustering methods, we introduced a new algorithm, which uses frequent patterns to generate both clustering model and cluster contents. The algorithm iteratively merges smaller, similar clusters until the requested number of clusters is reached. In the absence of a well-defined metric space, we propose the inter-cluster similarity measure based on co-occurrence to be used in cluster merging. An important feature of the algorithm is that it does not only divide the web users into clusters but also delivers a classification model that can be used to classify future web users. Since the model is formed by a set of frequent patterns to be contained, the classification of a new web user access path simply consists in checking if it contains patterns from any of the clusters descriptions. If the new user access path contains patterns from different clusters, then it belongs to many clusters with different membership probabilities.In future we can use big data for analysis purpose.''' ANNEXURE A LABORATORY ASSIGNMENTS ON PROJECT ANALYSIS OF ALGORITHMIC DESIGN A.1 CANVAS DIAGRAM Knowledge canvas represents about identification of opportunity for product. Feasibility is represented w.r.t. business perspective. I D E A Increase Drive Educate Accelerate Improve Deliver Evaluate Associate Ignore Decrease Eliminate Avoid Table A.1: IDEA Matrix A.2 PROBLEM DESCRIPTION: NP hard problem do not have to be in NP and they do not have to be decision problem. The precise definition here is that a problem X is NP-hard if there is an NPcomplete problem Y.such that Y is reducible to X in polynomial time. Figure A.3: Problem Description ANNEXURE B LABORATORY ASSIGNMENTS ON PROJECT QUALITY AND RELIABILITY TESTING OF PROJECT DESIGN B.1 TESTING Figure B.1: Blackbox Testing''' CHAPTER 9 PROJECT PLANNER

Essay: Intelligent Analysis Of Weblog Mining

Essay details and download:

Text preview of this essay:

About this essay:

Essay details and download:

Text preview of this essay:

About this essay:

Essay Categories: