You are on page 1of 66

A SYSTEM TO FILTER UNWANTED MESSAGES FROM OSN USER WALLS

INTRODUCTION
Online Social Networks (OSNs) are today one of the most popular interactive medium to communicate, share, and disseminate a considerable amount of human life information. Daily and continuous communications imply the exchange of several types of content, including free text, image, audio, and video data. In the sharing In OSNs, information filtering can also be used for a different, more sensitive, purpose. This is due to the fact that in OSNs there is the possibility of posting or commenting other posts on particular public/private areas, called in general walls. Information filtering can therefore be used to give users the ability to automatically control the messages written on their own walls, by filtering out unwanted messages.

SCOPE OF THE PROJECT


The OSN have three layer there are graphical user interface, social network application and social network managers. The social network managers handle the basic functionalities like profile management, network based function etc. But in this project focused on other two layers and apply some new condition. Application layer have short text classifier and content based message filtering. Short text classifier classifying the messages based on the content. Content based message filter have black list and filtering policies. First, find relationship between the user and message senders and it will filter and calculate the probabilities using classifier. And the send a empty message below the probabilities result to the user. So our proposed system will give the direct control to the user that what kind of messages displays on their wall.

LITERATURE SURVEY Title: BoosTexter: A Boosting-based System for Text Categorization


Author: Robert E Schapire, Yoram Singer Year: 2000 We use adopt a different approach in which we use two extensions of AdaBoost that were specifically intended for multiclass, multi-label data. In the first extension, the goal of the learning algorithm is to predict all and only all of the correct labels. Thus, the learned classifier is evaluated in terms of its ability to predict a good approximation of the set of labels associated with a given document. In the second extension, the goal is to design a classifier that ranks the labels so that the correct labels will receive the highest ranks. We next describe BoosTexter, a system which embodies four versions of boosting based on these extensions, and we discuss the implementation issues that arise in multilabel text categorization.

Title: A Comparison of Classifiers and Document Representations for the Routing Problem
Author: H. Schutze, D.A. Hull,J.O. Pedersen Year: 1995 We compare two approaches to document routing, relevance feedback via query expansion and statistical classification with error minimization. We show that advanced classification algorithms perform 10-15% better than relevance feedback on the Tipster document collection. Since learning algorithms based on error minimization and numerical optimization are computationally intensive and prone to over fitting in a high dimensional feature space, it is necessary to apply some method of dimensionality reduction. We examine two different approaches, latent semantic indexing and feature selection of terms using a x2 -test of non independence.

Title: Content-Based Book Recommending Using Learning for Text Categorization


Author: Raymond J. Mooney, Loriene Roy Year: 1999 Recommender systems improve access to relevant products and information by making personalized suggestions based on previous examples of a user's likes and dislikes. Most existing recommender systems use social filtering methods that base recommendations on other users' preferences. By contrast, content-based methods use information about an item itself to make suggestions. This approach has the advantage of being able to recommended previously unrated items to users with unique interests and to provide explanations for its recommendations. We describe a content-based book recommending system that utilizes information extractionand a machine-learning algorithm for text categorization. Initial experimental results demonstrate that this approach can produce accurate recommendations. These experiments are based on ratings from random samplings of items and we discuss problems with previous experiments that employ skewed samples of user-selected examples to evaluate performance.

Title: Content-based Filtering in On-line Social Network


Author: M. Vanetti, E. Binaghi, B. Carminati, M. Carullo and E. Ferrari Year: 2010.
This work is the first step of a wider project. The early encouraging results we have obtained on the classification procedure prompt us to continue with other work that will aim to improve the quality of classification. Additionally, we plan to enhance our filtering rule system, with a more sophisticated approach to manage those messages caught just for the tolerance and to decide when a user should be inserted into a BL. For instance, the system can automatically take a decision about the messages blocked because of the tolerance, on the basis of some statistical data (e.g., number of blocked messages from the same author, number of times the creator has been inserted in the BL) as well as data on creator profile (e.g., relationships with the wall owner, age, sex). Further, we plan to test the robustness of our system against different adversary models. The development of a GUI to make easier BL and filtering rule specification is also a direction we plan to investigate.

UniProfile improves on previous work by modeling policies as first-class resources, protecting them in the same way as other resources, providing _ne-grained control over policy disclosure, and clearly distinguishing between policy disclosure and policy satisfaction, which gives users more flexibility in expressing their authorization requirements. We also show that UniProfile can be used with practical negotiation strategies without jeopardizing autonomy in the choice of strategy, and present criteria under which negotiations using UniProfile are guaranteed to succeed in establishing trust.

Title: Achieving Secure, Scalable, and Fine-grained Data Access Control in Cloud Computing
Author: S. Yu, C. Wang, K. Ren, and W. Lou Year: 2010 Cloud computing is an emerging computing paradigm in which resources of the computing infrastructure are provided as services over the Internet. As promising as it is, this paradigm also brings forth many new challenges for data security and access control when users outsource sensitive data for sharing on cloud servers, which are not within the same trusted domain as data owners. To keep sensitive user data confidential against untrusted servers, existing solutions usually apply cryptographic methods by disclosing data decryption keys only to authorized users. However, in doing so, these solutions inevitably introduce a heavy computation overhead on the data owner for key distribution and data management when finegrained data access control is desired, and thus do not scale well. The problem of simultaneously achieving fine-grainedness, scalability, and data confidentiality of access control actually still remains unresolved.

Title: Inductive Learning Algorithms and Representations for Text Categorization


Author: Susan Dumais, John Platt, David Heckerman, Mehran Sahami Year: 1998 Here, we describe results from experiments using a collection of hand-tagged financial newswire stories from Reuters. We use supervised learning methods to build our classifiers, and evaluate the resulting models on new test cases. The focus of our work has been on comparing the effectiveness of different inductive learning algorithms (Find Similar, Nave Bayes, Bayesian Networks, Decision Trees, and Support Vector Machines) in terms of learning speed, real-time classification speed, and classification accuracy. We also explored alternative document representations (words vs. syntactic phrases, and binary vs. non-binary features), and training set size.

Title: Learning and Revising User Profiles: The Identification of Interesting Web Sites
Author: M.J. Pazzani and D. Billsus Year: 1997 We discuss algorithms for learning and revising user profiles that can determine which World Wide Web sites on a given topic would be interesting to a user. We describe the use of a naive Bayesian classifier for this task, and demonstrate that it can incrementally learn profiles from user feedback on the interestingness of Web sites. Furthermore, the Bayesian classifier may easily be extended to revise user provided profiles. In an experimental evaluation we compare the Bayesian classifier to computationally more intensive alternatives, and show that it performs at least as well as these approaches throughout a range of different domains. In addition, we empirically analyze the effects of providing the classifier with background knowledge in form of user defined profiles and examine the use of lexical knowledge for feature selection. We find that both approaches can substantially increase the prediction accuracy.

Title: Towards the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions
Author: Gediminas Adomavicius and Alexander Tuzhilin Year: 2001 The paper presents an overview of the field of recommender systems and describes the current generation of recommendation methods that are usually classified into the following three main categories: content-based, collaborative, and hybrid recommendation approaches. The paper also describes various limitations of current recommendation methods and discusses possible extensions that can improve recommendation capabilities and make recommender systems applicable to an even broader range of applications. These extensions include, among others, improvement of understanding of users and items, incorporation of the contextual information into the recommendation process, support for multi-criteria ratings, and provision of more flexible and less intrusive types of recommendations.

Title: Text Categorization with Support Vector Machines: Learning with Many Relevant Features
Author: T. Joachims Year: 1998 They are introduces support vector machines for text categorization. It provides both theoretical and empirical evidence that SVMs are very well suited for text categorization. The theoretical analysis concludes that SVMs acknowledge the particular properties of text In that experimental results show that SVMs consistently achieve good performance on text on text categorization tasks, outperforming existing methods substantially and significantly. With their ability to generalized well in high dimensional feature spaces, SVMs eliminate the need for feature selection, making the application of text categorization considerably easier. Another advantage of SVMs over the conventional methods in their robustness.

In this paper we present two constructions of Fuzzy IBE schemes. Our constructions can be viewed as an Identity-Based Encryption of a message under several attributes that compose a (fuzzy) identity. Our IBE schemes are both error-tolerant and secure against collusion attacks. Additionally, our basic construction does not use random oracles. We prove the security of our schemes under the Selective-ID security model.

Title: Hierarchical Attribute-Based Encryption for FineGrained Access Control in Cloud Storage Services
Author: G.Wang, Q. Liu, and J.Wu Year: 2010 Cloud computing, as an emerging computing paradigm, enables users to remotely store their data into a cloud so as to enjoy scalable services on-demand. Especially for small and medium-sized enterprises with limited budgets, they can achieve cost savings and productivity enhancements by using cloud-based services to manage projects, to make collaborations, and the like. However, allowing cloud service providers (CSPs), which are not in the same trusted domains as enterprise users, to take care of confidential data, may raise potential security and privacy issues. To keep the sensitive user data confidential against untrusted CSPs, a natural way is to apply cryptographic approaches, by disclosing decryption keys only to authorized users.

MODULES NAME
Authentication Profile Generation Accept Friends Request Send Request

Share Photos
Post Comment or Text Block Unwanted Comments

MODULE DESCRIPTION Authentication:


In this module User want to register the personal details in the database and get the authentication processes to go forward. In this module User want to give the database to admin all the registration process is done by a admin. After the registration process completed User can get the authentication permission, by using username and password login website. If the user enters a valid username/password combination they will be granted to access data. If the user enter invalid username and password that user will be considered as unauthorized user and denied access to that user.

Profile :
In this module user make our profile that details store in database the profile contains name, contact no, and email address, photos, and other information. Logged users can see their details and if they wish to change any of their information they can edit it.

Friends :
In this module user add new friends and view our friends and details. Logged users can see their friend list and if they wish to add friends

Send Request:
In this module user select friend to send request. logged user view request accept our friend request.

Post Photos:
In this module user get the photo from the database and add new photo and publish on their wall.

Post comments: In this module user post any photo in public wall, any one post a comments for that photo. Unknown persons also post comments but we dont know the character about the member in case out of group.

Block unwanted comments:


In this module we have to calculate the probability of the message contents. That result will be display on the user private wall with two options like accept and reject

Authentication:
Database

User

Allow to Access Page

Profile:
User Create Profile

Data Base

Accept Request:
User New Friends Data Base

View Friends

Send Request:

User
Select friend Data Base

Send request

Share Photos:

User

Add New Photos

Publish

View Photos

Data Base

Post comments:

Share Photo

Known or Unknown person

Post Comments

Block the unwanted comments:

Post Comments

Calculate Probability based on content of comments

Display the probability result on user private wall

Data Base

GIVEN INPUT EXPECTED OUTPUT


Authentication (login /Registration) Input: Register the user Details and give the username and Password to login Output: They will be granted to access the datas Profile Input: user enter our information like email id, contact no, and other information Output: create profile and store information into database Friends Input: user selects our friend to add Output: add friend in our profile Send Request Input: user select friend to request Output: Send request

Post Photos Input: User post some photos on their wall. Output: Display the photos.

Post Comments Input: Known or unknown persons are send a comment. Output: The comments will display.

Block the unwanted comments Input: The comments compare to the database data and calculate the probability. Output: The probability results display on user private wall so we have to control directly.

TECHNIQUE USED
Machine Learning (ML) Text Categorization Techniques: Step 1: SHORT TEXT CLASSIFIER i)Text Representation we consider three types of features, BoW, Document properties (Dp) andContextual Features (CF). they are entirely derived from the information contained within the text of the message. Text representation using endogenous knowledge has a good general applicability; however, in operational settings, it is legitimate to use also exogenous knowledge, i.e., any source of information outside the message body but directly or indirectly related to the message itself. ii) Machine Learning-Based Classification We address short text categorization as a hierarchical twolevel classification process. The first-level classifier performs a binary hard categorization that labels messages as Neutral and Nonneutral. The first-level filtering task facilitates thesubsequent second-level task in which a finer-grained classification is performed. The second-level classifier performs a soft-partition of Nonneutral messages assigning a given message a gradual membership to each of the nonneutral classes.

iii) Radial Basis Function Networks (RBFN) RFBNs have a single hidden layer of processing units with local, restricted activation domain: a Gaussian function is commonly used, but any other locally tunable function can be used. They were introduced as a neural network evolution of exact interpolation, and are demonstrated to have the universal approximation property. Step 2: FILTERING RULES AND BLACKLIST MANAGEMENT i)Filtering Rules: a) Creator specification This implies to state conditions on type, depth, and trust values of the relationship( s) creators should be involved in order to apply them the specified rules. A creator specification creatorSpec implicitly denotes a set of OSN users A set of attribute constraints of the form an OP av, where an is a user profile attribute name,av and OP are, respectively, a profile attribute value and a comparison operator, compatible with an s domain. A set of relationship constraints of the form (m, rt, minDepth, maxTrust) denoting all the OSN users participating with user m in a relationship of type rt, having a depth greater than or equal to minDepth, and a trust value less than or equal to maxTrust.

b)Filtering Rule author is the user who specifies the rule; creatorSpec is a creator specification, specified according to Definition 1; contentSpec is a Boolean expression defined on content constraints of the form (C, ml) where C is a class of the first or second level and ml is the minimum membership level threshold required for class C to make the constraint satisfied; action { fblock notify} denotes the action to be performed by the system on the messages matching contentSpec and created by users identified by creatorSpec. ii) Blacklists A BL rule is a tuple (author, creatorSpec, creatorBehavior, T), where author is the OSN user who specifies the rule, i.e., the wall owner; creatorSpec is a creator specification, specified according to Definition 1; creatorBehavior consists of two components RFBlocked and minBanned. T denotes the time period the users identified by creatorSpec and creatorBehavior have to be banned from author wall.

SYSTEM SPECIFICATION SOFTWARE REQUIREMENTS:


Operating system IDE Front End Coding Language Backend : : : : : Windows7 Microsoft Visual Studio .NET 2010 WPF C# SQL Server 2008

HARDWARE REQUIREMENTS:
Processor Hard disk Mouse RAM Keyboard : Pentium Dual Core 2.00GHZ : 40 GB : Logitech. : 2GB(minimum) : 110 keys enhanced.

SYSTEM DESIGN
In our project user make our profile that details store in database. Logged users can see their details and if they wish to change any of their information they can edit it. User add new friends and view our friends and details. Logged users can see their friend list and if they wish to add friends user add new friends and view our friends and details. Logged users can see their friend list and if they wish to add friends user post new message and view our friend message. Logged users post a photo on their wall at the time known or unknown persons are post comments about that photo. Sometimes they are sending unwanted messages like vulgar, politics, Violence etc. So we have to calculate the probability based on the comment contents. And those results will send to user private wall. Based on the result user will take the decision.

USE CASE DIAGRAM:


A use case diagram is a type of behavioral diagram created from a Use-case analysis. The purpose of use case is to present overview of the functionality provided by the system in terms of actors, their goals and any dependencies between those use cases.

Register/Login

Profile OSN User

Accept the Request

Send request OSN Managers

Post Photos and display Known Person Comments

Unknown Person Calculate the Probability

CLASS DIAGRAM
A class diagram in the UML is a type of static structure diagram that describes the structure of a system by showing the systems classes, their attributes, and the relationships between the classes. Private visibility hides information from anything outside the class partition. Public visibility allows all other classes to view the marked information. Protected visibility allows child classes to access information they inherited from a parent class.

OBJECT DIAGRAM:
An object diagram in the Unified Modeling Language (UML) is a diagram that shows a complete or partial view of the structure of a modeled system at a specific time. An Object diagram focuses on some particular set of object instances and attributes, and the links between the instances. A correlated set of object diagrams provides insight into how an arbitrary view of a system is expected to evolve over time. Object diagrams are more concrete than class diagrams, and are often used to provide examples, or act as test cases for the class diagrams. Only those aspects of a model that are of current interest need be shown on an object diagram.

User Login Name=***** Password=******

Profile

Photos

Category=***** Property=***** Photos

Add photos=****

Add photos=**** Friends

Characters=**** Book Informations=****

Message Post message=**** Probability=****

STATE DIAGRAM
A state diagram is a type of diagram used in computer science and related fields to describe the behavior of systems. State diagrams require that the system described is composed of a finite number of states; sometimes, this is indeed the case, while at other times this is a reasonable abstraction. There are many forms of state diagrams, which differ slightly and have different semantics.

Login

User Known/Unknown Person Send Request

Accept the Request

Share the Photo

Post Comments

Block the unwanted comments

ACTIVITY DIAGRAM:
Activity diagram are a loosely defined diagram to show workflows of stepwise activities and actions, with support for choice, iteration and concurrency. UML, activity diagrams can be used to describe the business and operational step-by-step workflows of components in a system. UML activity diagrams could potentially model the internal logic of a complex operation. In many ways UML activity diagrams are the object-oriented equivalent of flow charts and data flow diagrams (DFDs) from structural development.

User

Known/Unknown Persons

Login

Send Request

Accept the Requests

Share Photos/Messages

Post the comments

SEQUENCE DIAGRAM:
A sequence diagram in UML is a kind of interaction diagram that shows how the processes operate with one another and in what order. It is a construct of a message sequence chart. Sequence diagrams are sometimes called Event-trace diagrams, event scenarios, and timing diagrams. The below diagram shows the sequence flow shows how the process occurs in this project.

Login

User

Known/Unknow n Person

Public Wall

Firewall

Login

Login

Send request

Acccept the requests

Share Photos

Post Comments

Classification based on the content

Calculate the Probability based on the comments content

Direct Control Accept/Reject

COLLABORATION DIAGRAM:
A collaboration diagram show the objects and relationships involved in an interaction, and the sequence of messages exchanged among the objects during the interaction.

The collaboration diagram can be a decomposition of a class, class diagram, or part of a class diagram. It can be the decomposition of a use case, use case diagram, or part of a use case diagram. The collaboration diagram shows messages being sent between classes and object (instances). A diagram is created for each system operation that relates to the current development cycle (iteration).

Login 1: Login 3: Send request 2: Login Known/Unkno wn Person 4: Acccept the requests

User

5: Share Photos 9: Direct Control Accept/Reject 8: Calculate the Probability based on the comments content 6: Post Comments 7: Classification based on the content Public Wall Firewall

COMPONENT DIAGRAM:
The component diagram's main purpose is to show the structural relationships between the components of a system. A component represented implementation items, such as files and executables. Unfortunately, this conflicted with the more common use of the term component," which refers to things such as COM components. Over time and across successive releases of UML, the original UML meaning of components was mostly lost. UML 2 officially changes the essential meaning of the component concept; in UML 2, components are considered autonomous, encapsulated units within a system or subsystem that provide one or more interfaces.

Login

User

Known/Unknown Person

Black List

FireWall

Calculate Probability

Classifier

DATA FLOW DIAGRAM:


A Data flow diagram (DFD) is a graphical representation of the flow of data through an information system. It differs from the flowchart as it shows the data flow instead of the control flow of the program. A data flow diagram can also be used for the visualization of data processing. The DFD is designed to show how a system is divided into smaller portions and to highlight the flow of data between those parts.

LEVEL 0

LEVEL 1

LEVEL 2

LEVEL 3

LEVEL 4

All Levels:

In this data flow diagram (DFD), trusted authority is the head for this project. Domain authority is the subhead for this project. It performs the administrator operation. Data Owner first get the permission from the domain authority and encrypt the data and store the data in cloud storage. At a time trusted authority create one decryption key to relevant data and provide the decryption key to the domain authority. Data Consumer pay some amount of money to the domain authority and get the decryption key. Finally get and decrypt the data from cloud storage.

E-R DIAGRAM:
In software engineering, an entity-relationship model (ERM) is an abstract and conceptual representation of data. Entity-relationship modeling is a database modeling method, used to produce a type of conceptual schema or semantic data model of a system, often a relational database, and its requirements in a topdown fashion. Diagrams created by this process are called entityrelationship diagrams, ER diagrams, or ERDs. An entity-relationship (ER) diagram is a specialized graphic that illustrates the relationships between entities in a database. ER diagrams often use symbols to represent three different types of information. Boxes are commonly used to represent entities. Diamonds are normally used to represent relationships and ovals are used to represent attributes.

Login

Classifiers

Black List

Users

Fire Wall

Accept the others request

Share Photos Known/Unknown

Calculate Probability

Persons
Register Login Post Comments

SYSTEM ARCHITECTURE
The OSN have three layers, there are graphical user interface, social network application and social network managers. The social network managers handle the basic functionalities like profile management, network based function etc. Now a days the online social network users receive some unwanted comments But in this project focused on other two layers and apply some new condition. Application layer have short text classifier and content based message filtering. Short text classifier classifying the messages based on the content. Content based message filter have black list and filtering policies. First, find relationship between the user and message senders and it will filter and calculate the probabilities using classifier. And the send a empty message below the probabilities result to the user. So our proposed system will give the direct control to the user that what kind of messages displays on their wall.

Future Enhancement Module Diagram & Description

Future Enhancement Description:


We plan to extend this work in several directions. In proposed system we have black Comments temporarily based on the contents and create specification. But, in our future we have to block the permanently based on the content probability and creator specifications.

Permanent Block:
User Share Photo Black Permanently

Known/Unknown Persons

Creator Specification

Data base

GIVEN INPUT EXPECTED OUTPUT


Permanently Block: Input: The unwanted message sender details stored in database. Output: that message sender based on the relationship and content block permanently.

ADVANTAGES
Proposed system give the control to what kind of messages post on own wall. It is secure because filter wall act as Administrator. The core components of the proposed system are the ContentBased Messages Filtering (CBMF) and the Short Text Classifier modules. Rule layer adopted for filtering unwanted messages. We start by describing FRs, then we illustrate the use of BLs. BL mechanism to avoid messages from undesired creators, independent from their contents.

APPLICATION
Online Social Network Application: In online social networks application every account holder post the comments for the user sharing files like messages or photos. In this process some known or unknown persons post unwanted comments like politics, vulgar, violence and etc. So users have direct control which kind of messages post on their wall. In this concept we have to implement all kind of online social networks like facebook, twitter, orkut etc.

CONCLUSION
We have presented system direct control to the user block unwanted messages on their social network wall. The system using the machine language soft classifier to label the contents are Neutral and Nonneutral. And then applying the Filter Rule based on the creators. Moreover, the flexibility of the system in terms of filtering options is enhanced through the management of BLs.

BIBILOGRAPHY:
1.M. Chau and H. Chen, A Machine Learning Approach to Web Page Filtering Using Content and Structure Analysis, Decision Support Systems, vol. 44, no. 2, pp. 482-494, 2008. 2. R.J. Mooney and L. Roy, Content-Based Book Recommending Using Learning for Text Categorization, Proc. Fifth ACM Conf. Digital Libraries, pp. 195-204, 2000. 3.F. Sebastiani, Machine Learning in Automated Text Categorization, ACM Computing Surveys, vol. 34, no. 1, pp. 1-47, 2002. 4. M. Vanetti, E. Binaghi, B. Carminati, M. Carullo, and E. Ferrari, Content-Based Filtering in On-Line Social Networks, Proc. ECML/PKDD Workshop Privacy and Security Issues in Data Mining and Machine Learning (PSDML 10), 2010. 5. N.J. Belkin and W.B. Croft, Information Filtering and Information Retrieval: Two Sides of the Same Coin? Comm. ACM, vol. 35, no. 12, pp. 29-38, 1992.

6. P.J. Denning, Electronic Junk, Comm. ACM, vol. 25, no. 3, pp. 163-165, 1982. 7. P.W. Foltz and S.T. Dumais, Personalized Information Delivery: An Analysis of Information Filtering Methods, Comm. ACM, vol. 35, no. 12, pp. 51-60, 1992. [9] P.S. Jacobs and L.F. Rau, Scisor: Extracting Information from On- Line News, Comm. ACM, vol. 33, no. 11, pp. 88 97, 1990. 8.S. Pollock, A Rule-Based Message Filtering System, ACM Trans. Office Information Systems, vol. 6, no. 3, pp. 232-254, 1988. 9.P.E. Baclace, Competitive Agents for Information Filtering, Comm. ACM, vol. 35, no. 12, p. 50, 1992. 10. M.J. Pazzani and D. Billsus, Learning and Revising User Profiles: The Identification of Interesting Web Sites, Machine Learning, vol. 27, no. 3, pp. 313-331, 1997.

THANK YOU