Systems Seminar - CSE
Real Time Analytics@Twitter
Add to Google Calendar
Real time analytics seems to be a buzz word these days. Twitter identified the need for real time analytics early on and invested in a massive data pipeline that collects, aggregates, processes large volumes of data in real time. At the heart of the pipeline is Twitter Storm, a real-time stream processing engine widely used in Twitter. Storm is used for real-time data analytics, time series aggregation, and powering real-time features like trending topics. In this talk, we will give an overview of real time analytics, discuss the twitter real time data pipeline and how Storm is used for extracting analytics. We will also discuss the challenges we faced and lessons we have learned while building this infrastructure at Twitter.
Karthik is the engineering manager and technical lead for Real Time Analytics at Twitter. He has two decades of experience working in parallel databases, big data infrastructure and networking. He cofounded Locomatix, a company that specializes in real time streaming processing on Hadoop and Cassandra using SQL that was acquired by Twitter. Before Locomatix, he had a brief stint with Greenplum where he worked on parallel query scheduling. Greenplum was eventually acquired by EMC for more than $300M. Prior to Greenplum, Karthik was at Juniper Networks where he designed and delivered platforms, protocols, databases and high availability solutions for network routers that are widely deployed in the Internet. Before joining Juniper at University of Wisconsin, he worked extensively in parallel database systems, query processing, scale out technologies, storage engine and online analytical systems. Several of these research were spun as a company later acquired by Teradata.
He is the author of several publications, patents and one of the best selling book "network Routing: Algorithms, Protocols and Architectures." He has a Ph.D. in Computer Science from UW Madison with a focus on databases.