IIIT-Delhi Institutional Repository

A sketch-based approach towards scalable and efficient attributed network embedding

Show simple item record

dc.contributor.author Chakraborty, Tapadeep
dc.contributor.author Bera, Debajyoti (Advisor)
dc.date.accessioned 2023-04-03T08:34:35Z
dc.date.available 2023-04-03T08:34:35Z
dc.date.issued 2021-12
dc.identifier.uri http://repository.iiitd.edu.in/xmlui/handle/123456789/1064
dc.description.abstract With the advent of big data, graphs have gained popularity as one of the most efficient data storage mechanisms. A graph can not only capture relationships between entities, but it can also store attributes associated with entities in the form of attributed nodes. This makes graphs quite a versatile data structure. Attributed network embedding refers to the task of representing each node of a graph as a low-dimensional vector so that it captures its neighborhood associations and attribute information. A downstream ma- chine learning algorithm can use such an embedding to perform node classification, link prediction, and community detection tasks. Several learning-based methods were recently proposed that can produce high utility embeddings, but they scale poorly in terms of embedding space and embedding time with respect to network size, and stutter for massive billion-scale networks. Our study addresses this problem by introducing BGENA (Binary-embedding GENer- ator for Attributed graphs), which uses a recently proposed fast and utility-preserving sketching method BinSketch along with a novel edge propagation mechanism to gen- erate binary embeddings of each node. BGENA is designed to preserve any arbitrary order of proximity of nodes within its embedding. As a result of using only fast bitwise operations for the entire embedding process, BGENA achieves anywhere between 10× to 100× speedup compared to some existing methods. BGENA’s binary embeddings allow for efficient bit-array/sparse-matrix representations to save space, making it four to eight times better in terms of the system’s memory requirement. We also propose its parallelized version named PBGENA (Parallelized BGENA), which uses MPI to lever- age the multi-core architecture of a system to further accelerate the embedding speed to nearly 16× over BGENA. PBGENA produced embedding results for all our graphs with 20,000 or fewer nodes in less than a second using an AMD 32-Core 3.2GHz server, and it did the job for TWeibo, a graph with over 2 million nodes and 50 million edges, in less than two minutes.Further, BGENA is the only method known to us that was able to embed MAKG, a graph with nearly 60 million nodes and a billion edges, within the 270GB memory cap of the system in just 8 hours with comparable accuracy. We evaluate PBGENA embeddings on tasks like node classification, link prediction, and graph visualization with several real-world networks of varied sizes, and outperform the state-of-the-art baselines in performance, often by large margins and at a fraction of the time. Our experiments found that specific embedding methods prefer particular graphs where the results are in the top echelon but underperform significantly for other graphs. However, after hyperparameter tuning, no such effects were observed for PBGENA. All of these make PBGENA a robust, high-utility, cost-effective, and low space budget embedding method. en_US
dc.language.iso en_US en_US
dc.publisher IIIT-Delhi en_US
dc.subject Attributed Network Embedding en_US
dc.subject Network Representation Learn- ing en_US
dc.subject Node Embedding, Sketching en_US
dc.subject Edge Propagation en_US
dc.subject Node Clas- sification en_US
dc.subject Link Prediction en_US
dc.subject BinSketch en_US
dc.subject Parallelization en_US
dc.subject Message Passing Interface en_US
dc.title A sketch-based approach towards scalable and efficient attributed network embedding en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Repository


Advanced Search

Browse

My Account