You are on page 1of 10

1.

Topic: Application of Genetic Algorithms to the Identification of


Website Link Structure
Link: http://ieeexplore.ieee.org/document/5562786/
Abstract: This paper explores website link structure considering websites as
interconnected graphs and analyzing their features as a social network. Factor
Analysis provides the statistical methodology to adequately extract the main website
profiles in terms of their internal structure. However, due to the large number of
indicators, a genetic search of their optimum number is proposed, and applied to a
case study based on 80 Spanish University websites. Results provide coherent and
relevant website profiles, and highlight the possibilities of Genetic Algorithms as a tool
for discovering new knowledge related to website link structures.

2. Topic: Efficient methodologies to optimize Website for link


structure based search engines
Link: http://ieeexplore.ieee.org/document/6823528/
Abstract: As more and more e-commerce companies are mushrooming on the
Internet, the competition becomes very high for those sites to be appeared on the top
of Search Engine Results Pages (SERPs). With the massive growth of Internet, the
dependability of Search Engines for Information Retrieval (IR) becomes a mandatory.
This paper provides an introduction to link structure base search engine ranking
algorithms and efficient methodologies to optimize Websites for link structure based
ranking algorithms.

3. Topic: An enhanced model for effective navigation of a website


using clustering technique
Link: http://ieeexplore.ieee.org/document/7033817/

Abstract: Designing well structured websites has long been a challenge to facilitate
effective user navigation. The reason for poor website design is that the web
developers understanding of website structure is considerably different from that of
the users. The reorganization of a website structure changes the location of familiar
items. They would not be frequently performed to improve the navigability. The
existing Mathematical Programming model facilitates user navigation on a website
with minimal changes to its current structure. It is appropriate for informational
websites whose content are static and relatively stable over time. It allows a page to
have more links than the out-degree threshold if the cost is reasonable. Number of
relevant mini sessions decreases as path threshold increases. Thus the Mathematical
Programming model focuses on enhancing the design of existing links before adding
new links. To improve the navigability further, an enhanced model using Graph
Partitioned Clustering algorithm is proposed. It is used to group the potential users
with similar navigation pattern from preprocessed web log data. Clustering results
include number of visits made to a single webpage, most frequently viewed page and
navigation behavior of the users. To minimize the average weighted shortest distance
between pages and the average surfing distance, the enhanced mathematical
programming model is proposed and applied to the clustered results. After identifying
the existing links to be improved and new links to be added, the web site current
structure has to be reorganized. Thus, it will improve the user navigation on a website
effectively.

4. Topic: Website Structure Optimization System Model and


Algorithms
Link: http://ieeexplore.ieee.org/document/6113705/
Abstract: Website structure optimization is of great research value and practical
significance and has drawn public attention of researchers at home and abroad. This
article has carried out in-depth analysis of the system model of the website structure
optimization and its characteristics at each phase and adopts specific algorithm
against main phases of this system model. Such algorithm can be divided into three
steps, namely, generating frequent closed sequential pattern, adopting hierarchical
clustering for the frequent closed sequential pattern and generating partial order
based on the clustering result. This article has given full description of algorithms for
various steps. In conclusion, the article has put forward feasibility proposals for
website structure optimization in the form of experimental results. Its purpose is
optimize link functions in order to reduce clicking of "unnecessary" pages by users.

5. Topic: PageChaser: A Tool for the Automatic Correction of


Broken Web Links

Link: http://ieeexplore.ieee.org/document/4497598/
Abstract: PageChaser is a system that monitors links between Web pages and
searches for the new locations of moved Web pages when it finds broken links. The
problem of searching for moved pages is different from typical information retrieval
problems. First, it is impossible to identify the final destination until the page is
actually moved, so the index-server approach is not necessarily effective. Secondly,
there is a large bias about where the new address is likely to be and crawler-based
solutions can be effectively implemented, avoiding the need to search the entire Web.
PageChaser incorporates a comprehensive set of heuristics, some of which are novel,
in a single unified framework. This paper explains the underlying ideas behind the
design and development of PageChaser.

6. Topic: Analysis and Modelling of Websites Quality Using Fuzzy


Technique
Link: http://ieeexplore.ieee.org/document/6168324/
Abstract: The quality of websites is evaluated on the basis of eleven important metrics
load time, response time, mark-up validation, broken link, accessibility error, size,
page rank, frequency of update, traffic and design optimisation. The metrics are taken
as linguistic variables. Fuzzy logic is used to evaluate grade of a website.

7. Topic: A Large Scale URL Verification Pipeline Using Hadoop


Link: http://ieeexplore.ieee.org/document/6137375/
Abstract: Data quality is a key element for local search and advertising. Inaccurate,
out-of-date or missing information causes an unpleasant search experience for users
and affects competitiveness of service providers. This paper addresses the problem
of evaluating link quality for business listings in local search and online advertising
domain. We introduce a novel system where we apply data mining technologies on a
Hadoop-based platform to provide an efficient and highly scalable solution for the
problem. Due to various reasons, links associated with business listings do not always
point to their business websites. Possible noises include parked domain, broken links,
third-party advertisers, irrelevant websites etc. To detect above noises and improve
link quality, we formulate this problem as a binary classification problem: whether a
given URL is the business website of the associated listing. Experiments conducted on
real-world data show that our system can verify millions of business listings against
about 100 million web pages in a couple of hours with 93% classification accuracy.

8. Topic: Identification and characterization of crawlers through


analysis of web logs
Link: http://ieeexplore.ieee.org/document/6731972/
Abstract: Web crawlers are software programs that automatically traverse the
hyperlink structure of the world-wide web in order to locate and retrieve information.
In addition to crawlers from search engines, we observed many other crawlers which
may gather business intelligence, confidential information or even execute attacks
based on gathered information while camouflaging their identity. Therefore, it is
important for a website owner to know who has crawled his site, and what they have
done. In this study we have analyzed crawler patterns in web server logs, developed a
methodology to identify crawlers and classified them into three categories. To
evaluate our methodology we used seven test crawler scenarios. We found that
approximately 53.25% of web crawler sessions were from known crawlers and
34.16% exhibit suspicious behavior.

9. Topic: Java Script Based Page rendering


Link: http://openmymind.net/2012/5/30/Client-Side-vs-Server-Side-Rendering/
Abstract: -

10.
Topic: Distributed Algorithms for Constructing a DepthFirst-Search Tree
Link: http://ieeexplore.ieee.org/document/5727871/
Abstract: We present more efficient distributed depth-firstsearch algorithms which
construct a depth-first-search tree for a communication network. The algorithms
require left| V right|(1 + r) messages and |V|(l + r) units of time in the worst case,
where left| V right| is the number of sites in the network, and 0 leqslant r le 1 . The
value of r depends on the network topology and possibly on the routing chosen. In the
best case, when the underlying network has a ring topology, r = 0 and our basic
algorithm requires V messages and time units, regardless of routing. We extend this
algorithm to achieve the same best case bound for other topologies. The worst case
bound, which has r = 12/left| V right|, applies if the network topology is a tree. The
improvement over the best of previous algorithms is achieved by dynamic
backtracking, with a minor increase in message length.

11.
Topic: The average complexity of depth-first search with
backtracking and cutoff

Link: http://ieeexplore.ieee.org/document/5390174/
Abstract: This paper analyzes two algorithms for depth-first search of binary trees.
The first algorithm uses a search strategy that terminates the search when a
successful leaf is reached. The algorithm does not use internal cutoff to restrict the
search space. If N is the depth of the tree, then the average number of nodes visited
by the algorithm is as low as O(N) and as high as O(2N) depending only on the value of
the probability parameter that characterizes the search. The second search algorithm
uses backtracking with cutoff. A decision to cut off the search at a node eliminates the
entire subtree below that node from further consideration. The surprising result for
this algorithm is that the average number of nodes visited grows linearly in the depth
of the tree, regardless of the cutoff probability. If the cutoff probability is high, then
the search has a high probability of failing without examining much of the tree. If the
cutoff probability is low, then the search has a high probability of succeeding on the
leftmost path of the tree without performing extensive backtracking. This model
sheds light on why some instances of NP-complete problems are solvable in practice
with a low average complexity.

12.
Topic: Depth-limited search applied to compute n-order
reflections in the analysis of the RCS in large and complex
targets
Link: http://ieeexplore.ieee.org/document/4619954/
Abstract: A new iterative algorithm able to consider multiple iterations between
different flat surfaces, to compute efficiently the monostatic and/or bistatic radar
cross section (RCS) of complex targets is presented. This method is based on a
combination of several ray-tracing acceleration techniques as AZB and SVP algorithms
together with uniformed search strategies as depth-limited search strategy, reducing
the CPU time and memory resources.

13.
Topic: Distributed web crawling: A framework for crawling
of micro-blog data
Link: http://ieeexplore.ieee.org/document/7446438/

Abstract: These days' social networks have attracted people to express and share their
interests. We aim to monitor public opinions and other valuable discoveries by using
the data collected from social network website Sina Weibo. This paper present a
distributed web crawler framework called SWORM, which runs on the Raspberry Pi
(cheap card-sized single-board computer) for fetching the micro-blog data and
overwhelms the traditional web crawlers on efficiency, scale, scalability and cost. The
framework can easily be extended according to the specific needs of the user with the
help of some simple python scripts. This paper first propose a model for micro-blog
network to confirm what and how our crawler will crawl from social website. Secondly
it will introduce the implementation details of the whole distributed system and finally
will present experimental results. We ran some crawlers within our framework on the
Raspberry Pi and stored the obtained resources in Shared MongoDB which is a
category of NoSQL. Experimental results demonstrated that the use of distributed
framework can greatly improve the efficiency and accuracy for collecting data.

14.

Topic: Framework for Distributed Semantic Web Crawler

Link: http://ieeexplore.ieee.org/document/7546329/
Abstract: Relevant information retrieval from the www mainly depends on the
technique and efficiency of a crawler. So crawlers must be capable enough to
understand the text and context of a link which they are going to crawl. Anchor text
contains a very useful information to know about the target web page. Because
knowledge about the target web page content helps the crawlers to decide their
preferences of crawling the particular page. In this paper we have presented a design
of distributed semantic web crawler capable of crawling both HTML and semantic web
pages written using owl/RDf. In our crawler a component called page analyser is used
to understand the theme of content of page and context of anchor tag in the page.
The output of the page analyser is used to make crawling decisions. Our approach
have revealed the great improvement in extracting the information from the links and
guide the crawler for more relevant domain specific crawling.

15.

Topic: Smart distributed web crawler

Link: http://ieeexplore.ieee.org/document/7518893/

Abstract: Centralized crawlers are not adequate to spider meaningful and relevant
portions of the Web. A crawler with good scalability and load balancing can bring
growth to performance. As the size of web is growing, in order to complete the
downloading of pages in fewer amounts of time and increase the coverage of crawlers
it is necessary to distribute the crawling process. In this paper, we present client
server architecture based smart distributed crawler for crawling web. In this
architecture load between the crawlers is managed by server and each time a crawler
is loaded, load is distributed to others by dynamically distributing the URLs. Focused
crawlers makes efficient usage of network bandwidth and storage capacity, when
distributed can enhance the performance.

16.
Topic: Software solution for optimal planning of sales
persons work based on Depth-First Search and Breadth-First
Search algorithms
Link: http://ieeexplore.ieee.org/document/7522330/
Abstract: This paper presents and describes the practical usage of Depth-First Search
and Breadth-First Search algorithms in the planning and optimization of sales persons
work. Experiments for optimal implementation of these two algorithms for planning
purposes are made through a specially developed MATLAB simulator. The application
consists of two parts: Web application and Mobile application. Web application is
developed using the Application Development Framework Technology (ADF), while for
the development of the mobile version of application, Oracle ADF Mobile is used.
Both applications are interactive with Google Maps, and based on the selected input
parameters (such as the origin and destination of sales persons, the way of planning,
any possible obstacles, etc.) they visually show the results, indicators and analysis. The
commercial version of the software, eSalesmanPlan, which uses the aforementioned
algorithms, is used in several companies in Bosnia and Herzegovina. All indicators
point to significant savings both in human as well as financial resources and it will be
also presented.

17.

Topic: Solutions for DDoS attacks on cloud

Link: http://ieeexplore.ieee.org/document/7508107/

Abstract: The internet has become the key driver for virtually every organization's
growth, brand awareness and operational efficiency. Unfortunately cyber terrorists
and organized criminals know this fact too. Using a Distributed Denial of Service attack
they can deny corporates and end users the access to internet, make web site going
slow, and deny access to corporate network and data, unable to service legitimate
users. It is not just these that are vulnerable, DDoS attacks are diversions.

18.
Topic: Virtual Thread: Maximizing Thread-Level Parallelism
beyond GPU Scheduling Limit
Link: http://ieeexplore.ieee.org/document/7551426/
Abstract: Modern GPUs require tens of thousands of concurrent threads to fully utilize
the massive amount of processing resources. However, thread concurrency in GPUs
can be diminished either due to shortage of thread scheduling structures (scheduling
limit), such as available program counters and single instruction multiple thread
stacks, or due to shortage of on-chip memory (capacity limit), such as register file and
shared memory. Our evaluations show that in practice concurrency in many general
purpose applications running on GPUs is curtailed by the scheduling limit rather than
the capacity limit. Maximizing the utilization of on-chip memory resources without
unduly increasing the scheduling complexity is a key goal of this paper. This paper
proposes a Virtual Thread (VT) architecture which assigns Cooperative Thread Arrays
(CTAs) up to the capacity limit, while ignoring the scheduling limit. However, to reduce
the logic complexity of managing more threads concurrently, we propose to place
CTAs into active and inactive states, such that the number of active CTAs still respects
the scheduling limit. When all the warps in an active CTA hit a long latency stall, the
active CTA is context switched out and the next ready CTA takes its place. We exploit
the fact that both active and inactive CTAs still fit within the capacity limit which
obviates the need to save and restore large amounts of CTA state. Thus VT
significantly reduces performance penalties of CTA swapping. By swapping between
active and inactive states, VT can exploit higher degree of thread level parallelism
without increasing logic complexity. Our simulation results show that VT improves
performance by 23.9% on average.

19.

Topic: Multiprogramming and multiprocessing

Link: http://ieeexplore.ieee.org/document/6500645/

Abstract: Multiprogramming and multiprocessing make economically feasible the


capabilities of on-line debugging, man machine interaction, and remote operation
and communication in information handling systems

20.
Topic: Exploiting Single-Threaded Model in Multi-Core InMemory Systems
Link: http://ieeexplore.ieee.org/document/7486988/
Abstract: The widely adopted single-threaded OLTP model assigns a single thread to
each static partition of the database for processing transactions in a partition. This
simplifies concurrency control while retaining parallelism. However, it suffers
performance loss arising from skewed workloads as well as transactions that span
multiple partitions. In this paper, we present a dynamic single-threaded in-memory
OLTP system, called LADS, that extends the simplicity of the single-threaded model.
The key innovation in LADS is the separation of dependency resolution and execution
into two non-overlapping phases for batches of transactions. After the first phase of
dependency resolution, the record actions of the transactions are partitioned and
ordered. Each independent partition is then executed sequentially by a single thread,
avoiding the need for locking. By careful mapping of the tasks to be performed to
threads, LADS is able to achieve a high degree of balanced parallelism. We evaluate
LADS against H-Store, a partition-based database; DORA, a data-oriented transaction
processing system; and SILO, a multi-core in-memory OLTP engine. The experimental
study shows that LADS achieves up to 20x higher throughput than existing systems
and exhibits better robustness with various workloads.

21.
Topic: Performance analysis of thread synchronization
strategies in concurrent data structures based on flat-combining
Link: http://ieeexplore.ieee.org/document/7561508/

Abstract: The article deals with the development of threads synchronizing strategies
based on the creation of concurrent flat-combining data structures as well as
research of their performance. The paper considers flat-combining approach and its
implementation in the library libcds, the development of thread synchronization
strategy and its possible implementations. The efficiency of synchronization strategies
usage is researched on the example of the open source library libcds. The research
revealed the strategy with the lowest operation execution time on a container and the
lowest amount of CPU resources, and identifies use cases of the developed strategies.
A mechanism with the developed synchronization strategy to build concurrent data
structures was implemented. The implemented strategies were integrated in the
cross-platform open source library libcds.

You might also like