The Scalability & Performance Conference

Surge 2015

September 23 - 25, National Harbor, MD

REGISTER NOW

About Surge Conference

Two days of mind-blowing practitioner-oriented sessions presented by some of the most established professionals in our field.

OmniTI has a reputation for scalable web applications and architectures, and a history of sharing the knowledge we learn working on some of the most critical web infrastructure you'll find anywhere.
Surge allows us to continue this mission, bringing the best and brightest in Web Operations to the East Coast. Now in its sixth year, Surge has become the place where thought leaders in scalability and performance gather.

Our Sponsors

Sponsorship Opportunities

Meet and network in the Gaylord National Resort's intimate setting with industry peers and attendees from all over the tech industry, as well as from the media and publishing sectors.

Interested in becoming a sponsor?
Please contact sherry@omniti.com

Download Surge Prospectus

Surge 2015 Speakers

Matt Ranney
UBER
Senior Staff Engineer
Designing for failure: Scaling Uber's backend by breaking everything.
THURSDAY, 11:30 AM
In Cherry Blossom | SCALING ARCHITECTURE

Matt Ranney
UBER • Senior Staff Engineer
@mranney
Matt is a Senior Staff Engineer at Uber where he works on architecture and distributed systems. Before Uber, he was co-founder and CTO of Voxer.
Designing for failure: Scaling Uber's backend by breaking everything.
As Uber scales its business to new products in new cities, the requirements for high availability and scalability increase. As the engineering team scales, doubling this year alone, the challenges of building a reliable system grow with it. At our current scale, even brief outages in the service are very costly, both in dollars to the company and with real world impact on people's lives.

To get better at handling failure and design for it, we've had to make failures more common. Every new system that we build is subjected to regular failure testing, even databases. This requires some new technology choices from the more comfortable ones that worked when we were smaller.

The shift from a smaller service with a few hardened components to a global operation with hundreds of services is as much cultural as it is technical. This talk will cover the Uber architecture and how it handles every failure we can think of. It'll also cover some real outages and how they've influenced our new design.
Lionel Barrow
Braintree
Developer
Evolving High Availability at Braintree
THURSDAY, 11:30 AM
In Cherry Blossom | SCALING ARCHITECTURE

Lionel Barrow
Braintree • Developer
@lionelbarrow
Lionel is a developer at Braintree and a graduate student at the University of Chicago. He mostly works on backend systems at Braintree and is interested in programming languages and distributed systems.
Evolving High Availability at Braintree
Braintree is a payment gateway: we process payments on behalf of other businesses. Because downtime directly costs our customers money, one of our highest priorities is keeping our API up at all times, so that none of our customers ever misses a payment. At Surge 2013, my colleague Paul Gross gave a talk titled "High Availability at Braintree" in which he detailed the techniques and strategies we use to keep our API up. In this talk, I'll examine what happened as our traffic exploded in the last 2 years: what continued to work, what didn't, and what lessons we learned.

In particular, our infrastructure has changed dramatically. As our traffic has grown, many of the tools that worked well at lower volumes became liabilities, and we've had to change things up. One particular tool I'll focus on is the Broxy, a custom web server we wrote in 2011 and recently phased out due to continuous operational problems. I'll also examine changes to our outbound HTTP stack, monitoring and logging tools, and others components.
Riley Berton
AppNexus
Principal Engineer
Towards Real Time Big Data
THURSDAY, 11:30 AM
In Cherry Blossom | SCALING ARCHITECTURE

Riley Berton
AppNexus • Principal Engineer
@rileyberton
Riley Berton is a Principal Engineer at AppNexus working on data streaming and processing. He has over 15 years of experience working on distributed and large scale systems. Prior to AppNexus he designed and built the audio verification system for Viggle, worked on embedded public safety equipment at Eventide, and annoyed lawyers everywhere by creating one of the first electronic legal bill review systems. Nowadays he makes things faster and scalier in streaming data and complex event processing.
Towards Real Time Big Data
Many companies operate Big Data technologies without actually having a Big Data problem. At AppNexus we deal with 175TB of data every calendar day which must be deduped, joined, aggregated and moved around.

This talk will cover:
  • The history of large data processing at AppNexus from mysql to Netezza to Hadoop to Kafka and custom CEPs as well as lessons along the way
  • How we process and handle 175TB daily and 1.6BN rows per minute while meeting (and improving) SLAs
  • How we stream this data around the world
  • Where the dividing line is between batch oriented and real time processing
  • The migration path from structured data/unstructured format to structured data/structured format and how we handle these migration paths
  • Ephemeral streams vs. durable streams (AKA, if it doesn't have to hit disk, don't hit disk)
  • At what scale are custom join/agg/dedupe processes necessary WRT hardware cost and generic solutions
Patrick Meenan
Google
Software Engineer
Web App Performance Measurement, Monitoring and Resiliency
THURSDAY, 11:30 AM
In Cherry Blossom | SCALING ARCHITECTURE

Patrick Meenan
Google • Software Engineer
@patmeenan
Patrick Meenan has been working on web performance in one form or another for the last 14 years and is currently working on Chrome performance at Google. He created the popular open-source WebPagetest web performance measurement tool, runs the free instance of it at http://www.webpagetest.org and can frequently be found in the forums helping site owners understand and improve their website performance.
Web App Performance Measurement, Monitoring and Resiliency
Recent advancements in web browsers expose the ability to measure and report on all sorts of things that used to only be available through testing tools. We will explore what the Navigation Timing, Resource Timing, Server Timing and Network Error Logging interfaces bring to the table, some techniques to get the most out of them and issues with typical performance testing.

We will also dive into the new "Service Workers" capabilities and explore some really crazy things that can be done around measurement, fail-over and failure prevention, all within the browser.
Alec Peterson
Message Systems
CTO
Taking Email In The Cloud To Massive Levels
THURSDAY, 11:30 AM
In Cherry Blossom | SCALING ARCHITECTURE

Alec Peterson
Message Systems • CTO
@ahpeterson
Alec is the Chief Technology Officer for Message Systems, which handles over 25% of the world’s legitimate email for the likes of GroupOn, Twitter, LinkedIn, Time Warner Cable, Pinterest, Zillow, Facebook, PayPal, Comcast and many others. In his role as CTO his teams are responsible for the development of all of Message Systems’ software, and recently has added oversight of the operation of the SparkPost operations team to his tasks. He started his career performing such tasks for some of the earliest ISPs, including Panix, Erols and RCN. After that, he was one of the founders of UltraDNS (since acquired by Neustar), where he developed and deployed the first commercial DNS service to be powered by an IP anycast network topology; IP anycast is now the standard for DDoS hardening for all DNS services on the Internet.
Taking Email In The Cloud To Massive Levels
Message Systems software handles over 25% of the world’s legitimate email. Until recently, this has been done through on premise software that Message Systems sold to its customers. That all changed when we released SparkPost and SparkPost Elite, both of which are cloud manifestations of our software.

In this presentation, we will explore:
  • The decision process we went through determining whether to deploy on bare metal or in a cloud infrastructure provider like AWS
  • The challenges we expected and faced when deploying email at scale in the cloud (Hint: they were quite different)
  • Effective cost modeling of cloud services, and how that can dramatically impact profitability of a service in the cloud
Some challenges we encountered:
  • The performance characteristics of traffic within a cloud environment versus traffic to/from the public Internet
  • Figuring out whether to run our own generic application services (think NoSQL database, load balancers) versus using cloud-provided applications
Ding Yuan
University of Toronto
Assistant Professor
Simple Testing Can Prevent Most Critical Failures
THURSDAY, 11:30 AM
In Cherry Blossom | SCALING ARCHITECTURE

Ding Yuan
University of Toronto • Assistant Professor
@dyuan3
Ding Yuan is an assistant professor in the Electrical and Computer Engineering Department of the University of Toronto. He works in computer systems, with a focus on their reliability and performance.
Simple Testing Can Prevent Most Critical Failures
Large, production quality distributed systems still fail periodically, and do so sometimes catastrophically, where most or all users experience an outage or data loss. We present the result of a comprehensive study investigating 198 randomly selected, user-reported failures that occurred on Cassandra, HBase, Hadoop Distributed File System (HDFS), Hadoop MapReduce, and Redis, with the goal of understanding how one or multiple faults eventually evolve into a user-visible failure.

We found the majority of catastrophic failures could easily have been prevented by performing simple testing on error handling code – the last line of defense – even without an understanding of the software design. We extracted three simple rules from the bugs that have lead to some of the catastrophic failures, and developed a static checker, Aspirator, capable of locating these bugs. Over 30% of the catastrophic failures would have been prevented had Aspirator been used and the identified bugs fixed. RunningAspirator on the code of 9 distributed systems located 143 bugs and bad practices that have been fixed or confirmed by the developers.
Speaker Coming Soon
Super cool company
With super cool responsibilities
Come back often or follow us on Twitter to see the next speakers announced.
THURSDAY, 11:30 AM
In Cherry Blossom | SCALING ARCHITECTURE
Speaker Coming Soon
Super cool company
With super cool responsibilities
Come back often or follow us on Twitter to see the next speakers announced.
THURSDAY, 11:30 AM
In Cherry Blossom | SCALING ARCHITECTURE
Speaker Coming Soon
Super cool company
With super cool responsibilities
Come back often or follow us on Twitter to see the next speakers announced.
THURSDAY, 11:30 AM
In Cherry Blossom | SCALING ARCHITECTURE
Speaker Coming Soon
Super cool company
With super cool responsibilities
Come back often or follow us on Twitter to see the next speakers announced.
THURSDAY, 11:30 AM
In Cherry Blossom | SCALING ARCHITECTURE
Speaker Coming Soon
Super cool company
With super cool responsibilities
Come back often or follow us on Twitter to see the next speakers announced.
THURSDAY, 11:30 AM
In Cherry Blossom | SCALING ARCHITECTURE
Speaker Coming Soon
Super cool company
With super cool responsibilities
Come back often or follow us on Twitter to see the next speakers announced.
THURSDAY, 11:30 AM
In Cherry Blossom | SCALING ARCHITECTURE

Where?

Surge 2015 will be hosted at the Gaylord National Resort in National Harbor, MD.

Gaylord National Resort is the cornerstone of the 300 acre, National Harbor waterfront entertainment district, located 8 miles south of Washington DC. The resort offers visitors fine dining and casual restaurants, unique shopping experiences, an indoor pool, and 20,000 square foot spa and fitness center. And, for late-night excitement, an express elevator speeds you to the two-story, rooftop, Pose Ultra-Lounge — site of 2015's Surge Party, sponsored by Carpathia.

Gaylord National Resort & Convention Center is now accepting hotel reservations for Surge 2015. Through their website you can book, modify, or cancel your hotel reservations at any time.

When?

September 23 - 25, 2015 at the Gaylord National Resort and Conference Center in Maryland.

Why?

The Surge Conference is the place where Web Infrastructure and Scalability experts gather, advance and emerge. It is where you can access the people and ideas that matter most to your company and career.

Friendly Space Policy

OmniTI is dedicated to providing a harassment-free conference experience for everyone. Harassment may include but is not limited to offensive verbal comments, sexual images in public spaces, deliberate intimidation, stalking, unwelcome following, harassing photography or recording, sustained disruption of talks or other events, inappropriate physical contact, and unwelcome sexual attention. Participants asked to stop any harassing behavior are expected to comply immediately. If a participant engages in harassing behavior, the conference organizers may take any action they deem appropriate, including warning the offender or expulsion from the conference.

If you are being harassed, notice that someone else is being harassed, or have any other concerns, please contact a member of conference staff immediately; conference staff will be happy to provide escorts or otherwise assist attendees to feel safe for the duration of the conference. We expect participants to follow these rules at all conference venues and conference-related social events. Conference staff can be identified by special badges.

Register for Surge 2015

img

ADD TO YOUR HOMESCREEN

Add Surge to your homescreen
for an enhanced experience!