Buy Books Online > SAP (Systems, applications & products in databases) > Big Data : Principles and Best Practices of Scalable Real - Time Data Systems (English) (Paperback)
Big Data : Principles and Best Practices of Scalable Real - Time Data Systems (English) (Paperback): Book by Nathan Marz, James Warren

Big Data : Principles and Best Practices of Scalable Real - Time Data Systems (English) (Paperback)

Product Details:    Share this by email:

ISBN: 9789351198062    Publisher: Dreamtech Press Year of publishing: 2015     Format:  Paperback No of Pages: 328        Language: English
" This book presents the Lambda Architecture, a scalable, easy-to-understand approach that can be built and run by a small team. You'll explore the theory of big data systems and how to implement them...Read more
" This book presents the Lambda Architecture, a scalable, easy-to-understand approach that can be built and run by a small team. You'll explore the theory of big data systems and how to implement them in practice. In addition to discovering a general framework for processing big data, you'll learn specific technologies like Hadoop, Storm and NoSQL databases. Part: Batch layer Data model for Big Data The properties of data The fact-based model for representing data Graph schemas complete data model for SuperWebAnalytics.com Summary Data model for Big Data: Illustration Why a serialization framework? Apache Thrift Limitations of serialization frameworks Summary Data storage on the batch layer Storage requirements for the master dataset Choosing a storage solution for the batch layer How distributed filesystems work Storing a master dataset with a distributed filesystem Vertical partitioning Low-level nature of distributed filesystems Storing the SuperWebAnalytics.com master dataset on a distributed filesystem Summary Data storage on the batch layer: Illustration Using the Hadoop Distributed File System Data storage in the batch layer with Pail Storing the master dataset for SuperWebAnalytics.com Summary Batch layer Motivating examples Computing on the batch layer Recomputation algorithms vs. incremental algorithms Scalability in the batch layer MapReduce: a paradigm for Big Data computing Low-level nature of MapReduce Pipe diagrams: a higher-level way of thinking about batch computation Summary Batch layer: Illustration An illustrative example Common pitfalls of data-processing tools An introduction to JCascalog Composition Summary An example batch layer: Architecture and algorithms Design of the SuperWebAnalytics.com batch layer Workflow overview Ingesting new data URL normalization User-identifier normalization Deduplicate pageviews Computing batch views Summary An example batch layer: Implementation Starting point Preparing the workflow Ingesting new data URL normalization User-identifier normalization Deduplicate pageviews Computing batch views Summary Part 2: Serving layer Serving layer Performance metrics for the serving layer The serving layer solution to the normalization/denormalization problem Requirements for a serving layer database Designing a serving layer for SuperWebAnalytics.com Contrasting with a fully incremental solution Summary Serving layer: Illustration Basics of ElephantDB Building the serving layer for SuperWebAnalytics.com Summary Part 3: Speed layer Realtime views Computing realtime views Storing realtime views Challenges of incremental computation Asynchronous versus synchronous updates Expiring realtime views Summary Realtime views: Illustration Cassandra's data model Using Cassandra Summary Queuing and stream processing Queuing Stream processing Higher-level, one-at-a-time stream processing SuperWebAnalytics.com speed layer Summary Queuing and stream processing: Illustration Defining topologies with Apache Storm Apache Storm clusters and deployment Guaranteeing message processing Implementing the SuperWebAnalytics.com uniques-over-time speed layer Summary Micro-batch stream processing Achieving exactly-once semantics Core concepts of micro-batch stream processing Extending pipe diagrams for micro-batch processing Finishing the speed layer for SuperWebAnalytics.com Pageviews over time 262 n Bounce-rate analysis Another look at the bounce-rate-analysis example Summary Micro-batch stream processing: Illustration Using Trident Finishing the SuperWebAnalytics.com speed layer Fully fault-tolerant, in-memory, micro-batch processing Summary Lambda Architecture in depth Defining data systems Batch and serving layers Speed layer Query layer Summary
Read less
About the author: Nathan Marz, James Warren
Nathan Marz is the creator of Apache Storm and the originator of the Lambda Architecture for big data systems. James Warren is an analytics architect... Read more
Nathan Marz is the creator of Apache Storm and the originator of the Lambda Architecture for big data systems. James Warren is an analytics architect with a background in machine learning and scientific computing.
Read less

Recommended Books for you - See all

OUT OF STOCK

Looks like the book you were looking for is currently not in stock with us. Please leave us your email and phone number.
We will get back to you as soon as it's available.

Your email
Phone No.
Enter characters from image

Rs.200 OFF

on purchase of Rs.500 & above

1.Click on Add to Cart & Proceed to Checkout
2.Under payment options, choose
"Redeem Coupon Code/Gift Certificate"
3.Enter Promo Code "SHOP200"
4.Get Rs.200 off, Pay Balance Amount

Valid upto 21st July,13

Rs.200 OFF

on purchase of Rs.500 & above

Terms & Conditions