Link-Based Clustering Algorithm for Clustering Web Documents

P. Ashokkumar; S. Don

doi:10.1520/JTE20180497

Home
Products & Services
Standards & Publications
Journals

Journal Published Online: 28 February 2019

Volume 47, Issue 6

Link-Based Clustering Algorithm for Clustering Web Documents

CODEN: JTEVAB

Abstract

Clustering web documents involves the use of a large amount of words to be inputted to clustering algorithms such as K-Means, Cosine Similarity, Latent Discelet Allocation, and so on. This causes the clustering process to consume much time as the number of words in each document increases. In many web documents, web links are available along with the contents; these web link texts may contain a tremendous amount of information for clustering. In our work, we show that just using the web link text alone gives better clustering efficiency than considering the whole document text. We implemented our algorithm with two benchmark datasets, and the results show that the clustering efficiency is increased by our algorithm more than the existing methods.

Author Information

Ashokkumar, P.

School of Computer Science and Engineering, Vellore, Tamil Nadu, India

Don, S.

TIFAC CORE in Automotive Infotronics, School of Computer Science and Engineering, Vellore, Tamil Nadu, India

Pages: 12

Price: $25.00

ASTM License Agreement

IMPORTANT- READ THESE TERMS CAREFULLY BEFORE ENTERING THIS ASTM PRODUCT.
By purchasing a subscription and clicking through this agreement, you are entering into a contract, and acknowledge that you have read this License Agreement, that you understand it and agree to be bound by its terms. If you do not agree to the terms of this License Agreement, promptly exit this page without entering the ASTM Product.

1. Ownership:
This Product is copyrighted, both as a compilation and as individual standards, articles and/or documents ("Documents") by ASTM ("ASTM"), 100 Barr Harbor Drive, West Conshohocken, PA 19428-2959 USA, except as may be explicitly noted in the text of the individual Documents. All rights reserved. You (Licensee) have no ownership or other rights in the ASTM Product or in the Documents. This is not a sale; all right, title and interest in the ASTM Product or Documents (in both electronic file and hard copy) belong to ASTM. You may not remove or obscure the copyright notice or other notice contained in the ASTM Product or Documents.

2. Definitions.

A. Types of Licensees:

(i) Individual User:
a single unique computer, with an individual IP address;

(ii) Single-Site:
one geographic location or to multiple sites within one city that are part of a single organization unit administered centrally; for example, different campuses of the same university within the same city administered centrally.

(iii) Multi-Site:
an organization or company with independently administered multiple locations within one city; or an organization or company located in more than one city, state or country, with central administration for all locations.

B. Authorized Users:
any individual who has subscribed to this Product; if a Site License, also includes registered students, faculty or staff member, or employee of the Licensee, at the Single or Multiple Site.

3. Limited License.
ASTM grants Licensee a limited, revocable, nonexclusive, non-transferable license to access, by means of one or more authorized IP addresses, and according to the terms of this Agreement, to make the uses permitted and described below, each ASTM Product to which Licensee has subscribed.

A. Specific Licenses:

(i) Individual User:

(a) the right to browse, search, retrieve, display and view the Product;

(b) the right to download, store or print single copies of individual Documents, or portions of such Documents, solely for Licensee's own use. That is, Licensee may access and download an electronic file of a Document (or portion of a Document) for temporary storage on one computer for purposes of viewing, and/or printing one copy of a Document for individual use. Neither the electronic file nor the single hard copy print may be reproduced in anyway. In addition, the electronic file may not be distributed elsewhere over computer networks or otherwise. That is, the electronic file cannot be emailed, downloaded to disk, copied to another hard drive or otherwise shared. The single hard copy print may only be distributed to others for their internal use within your organization; it may not be copied. The individual Document downloaded may not otherwise be sold or resold, rented, leased, lent or sub-licensed.

(ii) Single-Site and Multi-Site Licenses:

(a) the right to browse, search, retrieve, display and view the Product;

(b) the right to download, store or print single copies of individual Documents, or portions of such Documents for the Authorized User's personal use, and to share such copies with other Authorized Users of Licensee within Licensee's computer network;

(c) if an educational institution, Licensee is permitted to provide a hard copy of individual Documents to individual students (Authorized Users) in a class at Licensee's location;

(d) the right to display, download and distribute hard copies of Documents for training Authorized Users or groups of Authorized Users.

(e) Licensee will undertake all necessary authentication and verification processes to ensure only Authorized Users can access the ASTM Product.

(f) Licensee will provide ASTM with a list of authorized IP (numeric IP domain addresses) addresses and, if Multi-Site, a list of authorized sites.

B. Prohibited Uses.

(i) This License describes all permitted uses. Any other use is prohibited, is a violation of this Agreement and can result in immediate termination of this License.

(ii) An Authorized User may not make this Product, or Documents, available to anyone other than another Authorized User, whether by Internet link, or by permitting access through his or her terminal or computer; or by other similar or dissimilar means or arrangements.

(iii) Specifically, no one is authorized to transmit, copy, or distribute any Document in any manner or for any purpose except as described in Section 3 of this License, without ASTM's prior express written permission. In particular, except as described in Section 3, no one may, without the prior express written permission of ASTM: (a) distribute or forward a copy (electronic or otherwise) of any article, file, or material obtained from any ASTM Product or Document; (b) reproduce or photocopy any standard, article, file, or material from any ASTM Product; (c) alter, modify, adapt, or translate any standard, article, file, or material obtained from any ASTM Product; (d) include any standard, article, file, or material obtained from any ASTM Product or Document in other works or otherwise create any derivative work based on any materials obtained from any ASTM Product or Document; (e) impose any charge for a copy (electronic or otherwise) of any standard,article, file, or material obtained from any ASTM Product or Document, except for normal printing/copying costs where such reproduction is authorized under Section 3; or (f) systematically download, archive, or centrally store substantial portions of standards, articles, files, or material obtained from any ASTM Product or Document. Inclusion of print or electronic copies in coursepacks or electronic reserves, or for distance learning use, is not authorized by this License and is prohibited without ASTM's prior written permission.

(iv) Licensee may not utilize the Product, or access to the Product, for commercial purposes, including but not limited to the sale of Documents, materials, fee-for-service use of the Product or bulk reproduction or distribution of Documents in any form; nor may Licensee impose special charges on Authorized Users for use of the Product beyond reasonable printing or administrative costs.

C. Copyright Notice. All copies of material from an ASTM Product must bear proper copyright notice in ASTM's name, as shown in the initial page of each standard, article, file or material. Obscuring, deletion or modification of the copyright notice is not permitted.

4. Detection of Prohibited Uses.

A. Licensee is responsible for taking reasonable measures to prevent prohibited uses, and promptly notify ASTM of any infringements of copyright or prohibited use of which Licensee becomes aware. Licensee will cooperate with ASTM in investigating any such prohibited uses and will take reasonable steps to ensure the cessation of such activity and to prevent any reoccurrence.

B. Licensee shall use all reasonable efforts to protect the Product from any use that is not permitted under this Agreement, and shall notify ASTM of any use of which it learns or is notified.

5. Continued Access to Product.
ASTM reserves the right to terminate this License, upon written notice, if Licensee materially breaches the terms of this Agreement. If Licensee fails to pay ASTM any license or subscription fees when due, ASTM will provide the Licensee with a 30-day period within which to cure such breach. No cure period will be provided for material breaches relating to violations of Section 3 or any other breach likely to cause ASTM irreparable harm. If Licensee's subscription to an ASTM Product terminates, further access to the online database will be denied. If Licensee or Authorized Users materially breach this License or make prohibited uses of material in any ASTM Product, ASTM reserves the right to deny Licensee any access to the ASTM Product, in ASTM's sole discretion.

6. Delivery Formats and Service.

A. Some ASTM Products use standard Internet HTML format. ASTM reserves the right to change such format upon three [3] months' notice to Licensee, although ASTM will make reasonable efforts to use commonly available formats. The Licensee and the Authorized Users are responsible for obtaining at their expense suitable Internet connections, Web browsers, and licenses for any software necessary to view the ASTM Products.

B. The ASTM Products are also available in Adobe Acrobat (PDF) format to Licensee and its Authorized Users, who are solely responsible for installing and configuring the appropriate Adobe Acrobat Reader software.

C. ASTM shall use reasonable efforts to make online access available on a continuous basis. Availability will be subject to periodic interruption and downtime for server maintenance, software installation or testing, loading new files, and reasons beyond the control of ASTM. ASTM does not guarantee access, and will not be liable for damages or refunds if the Product becomes unavailable temporarily, or if access becomes slow or incomplete due to system back-up procedures, Internet traffic volume, upgrades, overload of requests to servers, general network failures or delays, or any other cause that may from time to time make the Product unavailable for the Licensee or Licensee's Authorized Users.

7. Terms and Fees.

A. The term of this Agreement is _____________ ("Subscription Period"). Access to the Product is for the Subscription Period only. This Agreement will remain in effect thereafter for successive Subscription Periods so long as annual subscription fees, as such may change from time to time, are paid. Licensee and/or ASTM have the right to terminate this Agreement at the end of a Subscription Period by written notice given at least 30 days in advance.

B. Fees:

8. Verification.
ASTM has the right to verify compliance with this Agreement, at its expense, and at any time during the course of normal business hours. To do so, ASTM will engage an independent consultant, subject to a confidentiality agreement, to review Licensee's use of ASTM Product and/or Documents. Licensee agrees to permit access to its information and computer systems for this purpose. Verification will take place upon no less than 15 days notice, during normal business hours and in a manner that does not interfere unreasonably with Licensee's operations. If verification reveals unlicensed or prohibited use of the ASTM Products or Documents, Licensee agrees to reimburse ASTM for the costs incurred in verification and reimburse ASTM for any unlicensed/prohibited uses. By invoking this procedure, ASTM does not waive any of its rights to enforce this Agreement or to protect its intellectual property by any other means permitted by law. Licensee acknowledges and agrees that ASTM may imbed certain identifying or tracking information in the ASTM Products available on the Portal.

9. Passwords:
Licensee must immediately notify ASTM of any known or suspected unauthorized use(s) of its password(s), or any known or suspected breach of security, including the loss, theft unauthorized disclosure of such password or any unauthorized access to or use of the ASTM Product. Licensee is solely responsible for maintaining the confidentiality of its password(s) and for ensuring the authorized access and use of the ASTM Product. Personal accounts/passwords may not be shared.

10. Disclaimer of Warranty:
Unless specified in this Agreement, all express or implied conditions, representations and warranties, including any implied warranty of merchantability, fitness for a particular purpose or non-infringement are disclaimed, except to the extent that these disclaimers are held to be legally invalid.

11. Limitation of Liability:
To the extent not prohibited by law, in no event will ASTM be liable for any loss, damage, lost data or for special, indirect, consequential or punitive damages, however caused regardless of the theory of liability, arising out of or related to the use of the ASTM Product or downloading of the ASTM Documents. In no event will ASTM's liability exceed the amount paid by Licensee under this License Agreement.

12. General.

A. Termination:
This Agreement is effective until terminated. Licensee may terminate this Agreement at any time by destroying all copies (hard, digital or in any media)of the ASTM Documents and terminating all access to the ASTM Product.

B. Governing Law, Venue, and Jurisdiction:
This Agreement shall be interpreted and construed in accordance with the laws of the Commonwealth of Pennsylvania. Licensee agrees to submit to jurisdiction and venue in the state and federal courts of Pennsylvania for any dispute which may arise under this Agreement. Licensee also agrees to waive any claim of immunity it may possess.

C. Integration:
This Agreement is the entire agreement between Licensee and ASTM relating to its subject matter. It supersedes all prior or contemporaneous oral or written communications, proposals, representations and warranties and prevails over any conflicting or additional terms of any quote, order, acknowledgment, or other communication between the parties relating to its subject matter during the term of this Agreement. No modification of this Agreement will be binding, unless in writing and signed by an authorized representative of each party.

D. Assignment:
Licensee may not assign or transfer its rights under this Agreement without the prior written permission of ASTM.

E. Taxes.
Licensee must pay any applicable taxes, other than taxes on ASTM's net income, arising out of Licensee's use of the ASTM Product and/or rights granted under this Agreement.

Reprints and Permissions

Reprints and copyright permissions can be requested through the
Copyright Clearance Center

Details

Stock #: JTE20180497

ISSN: 0090-3973

DOI: 10.1520/JTE20180497

Link-Based Clustering Algorithm for Clustering Web Documents

Abstract

Author Information

Related

Reprints and Permissions

Details