The Scalability & Performance Conference

Surge 2015

September 23 - 25, National Harbor, MD

REGISTER NOW

About Surge Conference

Two days of mind-blowing practitioner-oriented sessions presented by some of the most established professionals in our field.

OmniTI has a reputation for scalable web applications and architectures, and a history of sharing the knowledge we learn working on some of the most critical web infrastructure you'll find anywhere.
Surge allows us to continue this mission, bringing the best and brightest in Web Operations to the East Coast. Now in its sixth year, Surge has become the place where thought leaders in scalability and performance gather.

When?

September 23 - 25, 2015 at the Gaylord National Resort and Conference Center in Maryland.

Why?

The Surge Conference is the place where Web Infrastructure and Scalability experts gather, advance and emerge. It is where you can access the people and ideas that matter most to your company and career.

Our Sponsors

Sponsorship Opportunities

Meet and network in the Gaylord National Resort's intimate setting with industry peers and attendees from all over the tech industry, as well as from the media and publishing sectors.

Interested in becoming a sponsor?
Please contact sherry@omniti.com

Download Surge Prospectus

Surge 2015 Speakers

KEYNOTE SPEAKER » Rhona Flin
University of Aberdeen
Emeritus Professor of Applied Psychology - Director of the Industrial Psychology Research Centre
Safety at the Sharp End: Skills for Decision Making Under Pressure
THURSDAY, 9:00 AM
Rhona Flin (PhD, FBPsS, FRSE) is Emeritus Professor of Applied Psychology at the University of Aberdeen. As Director of the Industrial Psychology Research Centre at the University of Aberdeen, she led a team of psychologists conducting research on human performance in high risk industries www.abdn.ac.uk/iprc Her group’s projects included studies of leadership, culture, team skills and decision making in healthcare, aviation and the energy industries. She is currently studying senior managers’ safety leadership and also non-technical skills in surgery and in the oil and gas sector. She was awarded the Roger Green Medal (Royal Aeronautical Society) for aviation human factors research and the John Bruce Medal (College of Surgeons Edinburgh) for behavioural science in surgery. She is a member of the Safety Advisory Committee for the Military Aviation Authority at the UK Ministry of Defence. Her books include Safety at the Sharp End: A Guide to Non-Technical Skills (2008) and Enhancing Surgical Performance: A Primer on Non-Technical Skills (2015) and she has published over 130 scientific articles.
Alec Peterson
Message Systems
CTO
Taking Email In The Cloud To Massive Levels
THURSDAY, 11:30 AM
In Cherry Blossom | SCALING ARCHITECTURE

Alec Peterson
Message Systems • CTO
@ahpeterson
Alec is the Chief Technology Officer for Message Systems, which handles over 25% of the world’s legitimate email for the likes of GroupOn, Twitter, LinkedIn, Time Warner Cable, Pinterest, Zillow, Facebook, PayPal, Comcast and many others. In his role as CTO his teams are responsible for the development of all of Message Systems’ software, and recently has added oversight of the operation of the SparkPost operations team to his tasks. He started his career performing such tasks for some of the earliest ISPs, including Panix, Erols and RCN. After that, he was one of the founders of UltraDNS (since acquired by Neustar), where he developed and deployed the first commercial DNS service to be powered by an IP anycast network topology; IP anycast is now the standard for DDoS hardening for all DNS services on the Internet.
Taking Email In The Cloud To Massive Levels
Message Systems software handles over 25% of the world’s legitimate email. Until recently, this has been done through on premise software that Message Systems sold to its customers. That all changed when we released SparkPost and SparkPost Elite, both of which are cloud manifestations of our software.

In this presentation, we will explore:
  • The decision process we went through determining whether to deploy on bare metal or in a cloud infrastructure provider like AWS
  • The challenges we expected and faced when deploying email at scale in the cloud (Hint: they were quite different)
  • Effective cost modeling of cloud services, and how that can dramatically impact profitability of a service in the cloud
Some challenges we encountered:
  • The performance characteristics of traffic within a cloud environment versus traffic to/from the public Internet
  • Figuring out whether to run our own generic application services (think NoSQL database, load balancers) versus using cloud-provided applications
Amie Durr
Message Systems
Vice President Product Management
Getting off the Ground: Lessons on becoming cloud first and how to really take flight
THURSDAY, 11:30 AM
In Cherry Blossom | SCALING ARCHITECTURE

Amie Durr
Message Systems • Vice President Product Management
@AmieDurr

Amie is responsible for delivering technologies that help businesses support and drive their messaging needs with a focus on scale, usability, engagement, and analytics. As the only person she knows with a background in both Mathematics and Anthropology, Amie has spent her career marrying her love of data and technology with her love of understanding people and processes. She is an evangelist and strong supporter of innovation and generally doing cool stuff, championing both the voice of the customer and the market, as well as the creative, technical skills of the engineers.
Getting off the Ground: Lessons on becoming cloud first and how to really take flight
"As-a-Service.” These days, it feels like you can put almost any technology-oriented word in front of that phrase and you’re likely to find a well established market with a seemingly unending number of vendors who are emerging every day to compete for their fare share. PaaS, IaaS, SaaS. Many of the vendors supporting these services are brand new companies, with new technology, that’s been built to support needs specifically driven by the cloud. There are even more vendors, however, that began their lives more traditionally, who are now attempting to make the transition from on-premise deployment models to cloud and service oriented ones. So how hard could that transition be, and what does it really mean for an organization, beyond “just” changing the delivery mechanism? And how do you do that while continuing to support and innovate for your existing on-premise customers? In this talk, we’ll discuss the lessons learned one year after deciding to move from a company focused purely on on-premise software delivery, to one that is definitively cloud-first.
Plan to walk away with:
  • An appreciation of the necessary shifts (cultural and technological) and all of the different areas that are impacted, both internally and externally.
  • An understanding of what new teams may be needed, who should run them, and how to scale both new and existing teams quickly and productively
  • 5 mistakes we made in our transformation to the cloud
Brett Huff
Fog Creek
Vice President of Engineering
Tradeoffs: How to make hard decisions
THURSDAY, 11:30 AM
In Cherry Blossom | SCALING ARCHITECTURE

Brett Huff
Fog Creek • VP of Engineering
@brett_r_huff

Brett is a Software Manager and Developer currently working as Vice President of Engineering at Fog Creek. He has previously worked on the high bandwidth, low latency trading systems at Goldman Sachs and on the intermittent, downhole networks of offshore oil drilling rigs with NOV IntelliServ. He currently enjoys working remotely from his home in Idaho with the minds that brought us Stack Exchange, Trello, and FogBugz.
Tradeoffs: How to make hard decisions
Best practices are best and Architecture can be perfect, but what do you do when the ideal world of conference talks meets the messy world of business, legacy systems, and imperfect developers? I want to build out a micro-services architecture in Haskell, but I need to ship existing code to feed my family and my team only knows Java. At some point we all run into tough choices where not all desires can be met. What then?

In this talk we’ll explore different ways to look at these tough choices. Why would it be okay to rewrite a system? When can we deal with slow performance? How do I convince my manager that we really do need this expensive server? I include practical examples of decisions from FogBugz, Trello, StackExchange, and more.
Bryan Cantrill
Joyent
CTO
Docker in Production: Tales From the Engine Room
THURSDAY, 11:30 AM
In Cherry Blossom | SCALING ARCHITECTURE

Bryan Cantrill
Joyent • CTO
@bcantrill

Bryan Cantrill is the CTO at Joyent , where he oversees worldwide development of the SmartOS and SmartDataCenter platforms, and the Node.js platform. Prior to joining Joyent, Bryan served as a Distinguished Engineer at Sun Microsystems, where he spent over a decade working on system software, from the guts of the kernel to client-code on the browser. In particular, he co-designed and implemented DTrace, a facility for dynamic instrumentation of production systems that won the Wall Street Journal's top Technology Innovation Award in 2006 and the USENIX Software Tools User Group Award in 2008. Bryan also co-founded the Fishworks group at Sun, where he designed and implemented the DTrace-based analytics facility for the Sun Storage 7000 series of appliances. Bryan received the ScB magna cum laude with honors in Computer Science from Brown University.
Docker in Production: Tales From the Engine Room
Docker has surged in popularity in the last two years, but there have been open questions about its suitability for production in terms of security and scale. At Joyent, we approached the problem from a different perspective: having deployed containers in production for nearly a decade, we knew that the underlying technology was ready -- but could developer-friendly Docker meet the needs of modern operations teams? In developing and deploying a public, multi-tenant container-native Docker service, we learned some hard lessons -- but also made some exciting discoveries along the way.

In this session, Joyent CTO Bryan Cantrill will share our experiences and point to the future of containers in production.
Clinton Wolfe
OmniTI
DevOps Practice Lead
Managing Your Product Manager: Building relationships to ensure an operable product
THURSDAY, 11:30 AM
In Cherry Blossom | SCALING ARCHITECTURE

Clinton Wolfe
OmniTI • DevOps Practice Lead
@clintoncwolfe
Clinton Wolfe leads the DevOps Practice at OmniTI , which means he voluntarily chooses to go into heavily siloed, dysfunctional organizations and try to get them to talk to each other with as few stabbings as possible. He's especially interested in testable infrastructure, and the processes needed to support quality throughout the application lifecycle. He is also a Daddy.
Managing Your Product Manager: Building relationships to ensure an operable product
People from a "Product background" often have zero technical experience, but find themselves needing to dictate the deliverables. Product owners are under great pressure from Marketing and Leadership to focus on "features" from a customer perspective; the so-called "non-functional requirements" often fall by the wayside. Operability - monitorabilty, recoverability, availability, performance, among other aspects - is difficult to bake into an application that was developed without such consideration.

This talk will present practical approaches to bridge-building between Ops and Product. Focusing especially on cross-functional Agile teams with leadership with little or no Ops background, we will explore whether "planning the work will result in the planned work being the work that is done." When working with a mixed team, doing development, deployment, incident response, and everything in support of that, such plans go off the rails. Methods of championing Ops needs while avoiding "the sky is falling" perceptions will be presented. What kinds of unplanned work exist? Are there steps we can take to convert unplanned work into planned work? How does work flow through the team? How does unplanned work disrupt the flow?

There will be pie charts, but no pie.
Ding Yuan
University of Toronto
Assistant Professor
Simple Testing Can Prevent Most Critical Failures
THURSDAY, 11:30 AM
In Cherry Blossom | SCALING ARCHITECTURE

Ding Yuan
University of Toronto • Assistant Professor
@dyuan3
Ding Yuan is an assistant professor in the Electrical and Computer Engineering Department of the University of Toronto. He works in computer systems, with a focus on their reliability and performance.
Simple Testing Can Prevent Most Critical Failures
Large, production quality distributed systems still fail periodically, and do so sometimes catastrophically, where most or all users experience an outage or data loss. We present the result of a comprehensive study investigating 198 randomly selected, user-reported failures that occurred on Cassandra, HBase, Hadoop Distributed File System (HDFS), Hadoop MapReduce, and Redis, with the goal of understanding how one or multiple faults eventually evolve into a user-visible failure.

We found the majority of catastrophic failures could easily have been prevented by performing simple testing on error handling code – the last line of defense – even without an understanding of the software design. We extracted three simple rules from the bugs that have lead to some of the catastrophic failures, and developed a static checker, Aspirator, capable of locating these bugs. Over 30% of the catastrophic failures would have been prevented had Aspirator been used and the identified bugs fixed. RunningAspirator on the code of 9 distributed systems located 143 bugs and bad practices that have been fixed or confirmed by the developers.
Emily Dresner
Luminal, Inc.
Director of Product Development
Containers, the Good, the Bad and the Ugly
THURSDAY, 11:30 A.M.
In Cherry Blossom | SCALING ARCHITECTURE

Emily Dresner
Luminal, Inc. • Director of Product Development
@multiplexer

Emily Dresner is Director of Product Development at Luminal, Inc. where she leads teams in architecting and building novel, cutting edge orchestration and control layers for public and private cloud. Previously, she was the principal architect for Zenimax Online Studios where she lead teams integrating build systems, configuration management, content delivery, web and backend systems at scale in data centers across two continents. She holds engineering degrees from the University of Michigan.
Containers, the Good, the Bad and the Ugly
Containerization has been a feature of the computing landscape since the 1990s, although the technology matriculated only in the last decade. Sun released Solaris Zones in full release in 2005, the same year OpenVZ first appeared on the scene. LXC was officially supported in the Linux Kernel in its initial release in 2008. Containerization continued as a corner case in distributed computing with virtualization ruling the general computing landscape (ESXi, Xen, KVM, etc.). With the popularity of Docker, containerization reappeared explosively on the DevOps scene as a tool in an already large toolset. But, are containers the panacea to ops departmental woes marketing departments would like everyone to believe? This talk looks into what makes a container, cases where containers shine, where they fail, where they _really_ fail, and the right places to choose containerization for an infrastructure deployment project.
Eric Schrock
Delphix
VP of Engineering
Scaling Mentorship Through Management
THURSDAY, 11:30 AM
In Cherry Blossom | SCALING ARCHITECTURE

Eric Schrock
Delphix • VP of Engineering
@ericschrock

Eric is VP of Engineering at Delphix , where he has been scaling the product and team since 2010. He has always been passionate about solving hard systems problems, and has spent most of his career tackling the challenges of managing data in the enterprise. Prior to Delphix, Eric was a founding member of the Fishworks team at Sun that built the ZFS Storage Appliance product line. He started his career as a kernel hacker in the Solaris group after earning a BS in Computer Science from Brown.
Scaling Mentorship Through Management
Like many of us, my engineering ethos was forged in the crucible of a giant company. But working within the relatively small Solaris kernel group, and later the very small Fishworks team, I saw amazing engineers driving innovation despite management mediocrity. Like many of us, I came to the seemingly natural conclusion: management is toxic, simply a barrier to great engineers doing great things.

So when I joined a startup as an individual contributor, became a manager, and eventually VP of Engineering, I found myself suddenly on the other side of the fence. Would I reject management outright, building the flattest, self-assembling mesh of an org the world had ever seen? Or would I turn Benedict Arnold and embrace the very thing I found toxic only years before?

Embracing either extreme yields a culture with incidental mentorship, so I instead sought a different path, one that embraced mentorship as a core value. In this talk, I’ll discuss the positives and negatives of this choice as we’ve scaled the Delphix engineering team from 10 to 100, including:
  • How we structure the organization to guide and empower individuals to do great things
  • How we build product without a rigid encoding of ownership
  • How we weave mentorship into our culture and support it within management
Hooman Beheshti
Fastly
VP Technology
Measuring CDN performance and why you’re doing it wrong
THURSDAY, 11:30 AM
In Cherry Blossom | SCALING ARCHITECTURE

Hooman Beheshti
Fastly • VP Technology

Hooman Beheshti is VP of Technology at Fastly, where he develops web performance services for the world’s smartest CDN platform. A pioneer in the application acceleration space, Hooman helped design one of the original load balancers while at Radware and has held senior technology positions with Strangeloop Networks and Crescendo Networks. He has worked on the core technologies that make the Internet work faster for nearly 20 years and is an expert and frequent speaker on the subjects of load balancing, application performance, and content delivery networks.
Measuring CDN performance and why you’re doing it wrong
Integrating content delivery networks into your application infrastructure can offer many benefits, including major performance improvements for your applications. So understanding how CDNs perform — especially for your specific use cases — is vital. However, testing for measurement is complicated and nuanced, and results in metric overload and confusion. It's becoming increasingly important to understand measurement techniques, what they're telling you, and how to apply them to your actual content.

In this session, we'll examine the challenges around measuring CDN performance and focus on the different methods for measurement. We'll discuss what to measure, important metrics to focus on, and different ways that numbers may mislead you.
More specifically, we'll cover:
  • Different techniques for measuring CDN performance
  • Differentiating between network footprint and object delivery performance
  • Choosing the right content to test
  • Core metrics to focus on and how each impacts real traffic
  • Understanding cache hit ratio, why it can be misleading, and how to measure for it
Jos Boumans
Krux
VP of Technical Operations
The Metamorphosis of High Scale Message Handling
THURSDAY, 11:30 AM
In Cherry Blossom | SCALING ARCHITECTURE

Jos Boumans
Krux • VP of Technical Operations
@jiboumans

Jos Boumans is responsible for Infrastructure and Operations at Krux . He has over 15 years of software development and operations experience, both in the enterprise as well as the open source community. Prior to Krux, Jos ran the team behind the Ubuntu Linux server distribution at Canonical and he ran the RIPE Database, which is responsible for all of the authoritative IP address data in Europe, the Middle East & Asia.
The Metamorphosis of High Scale Message Handling
Keeping things real time, while at the same time scaling up, and keeping reliability & cost under control is a non-trivial challenge. We went through the exercise of kicking the tires on various solutions, and settled on Kafka + augmentations, as well as some additional OSS software we wrote.

In the talk, I’ll dive deep on all those aspects, and it should comprise a field guide of how to avoid the pitfalls we encountered, and if so desired, build like Krux did.
Laine Campbell
ShopWithMe
CTO
Database Reliability Engineering, Modernizing the DBA Role
THURSDAY, 11:30 AM
In Cherry Blossom | SCALING ARCHITECTURE

Laine Campbell
ShopWithMe • CTO
@lainevcampbell
Laine Campbell is currently the CTO of ShopWithMe , formerly AVP of Pythian’s open-source database practice, CEO and co-founder of Blackbird, and a founder of PalominoDB. Laine has been an Oracle, MySQL and Cassandra DBA architect and designer for 16 years with such organizations as Obama for America, Travelocity, Zappos, Chegg, LiveJournal, Disney Mobile, and Adobe. Laine is also an open-source proponent, and advocate for bringing technology, job opportunities, and privileges to underserved populations.
Database Reliability Engineering, Modernizing the DBA Role
Consider this a new database administration primer, focused on teaching developers and Systems Administrators about the core concepts of Database Operations within today's IT paradigms, including continuous deployment and delivery, DevOps culture, Infrastructure as Code and Cloud/Virtualized environments.

Database Reliability Engineering
  • Site Reliability Engineering Overview/History
  • DBA Overview/History - Traditional roles, compartmentalization,
  • Today's Operational DBA, with a focus on scripting/automation, multiple datastores and virtualization
  • Today’s DBRE: global focus, with an emphasis on tools, instrumentation and continuous improvement
Design | Deploy | Maintain
  • Knowing your Datastore, the CAP theorem applied to real workloads
  • Scalability - Understanding typical db constraints, patterns & anti-patterns
  • Observability, focused on instrumentation of db components
  • Configuration Management and Deployment
  • Data Stewardship - Rule 1, Protect the Data
  • Disaster Preparedness and Business Continuity
Lionel Barrow
Braintree
Developer
Evolving High Availability at Braintree
THURSDAY, 11:30 AM
In Cherry Blossom | SCALING ARCHITECTURE

Lionel Barrow
Braintree • Developer
@lionelbarrow
Lionel is a developer at Braintree and a graduate student at the University of Chicago. He mostly works on backend systems at Braintree and is interested in programming languages and distributed systems.
Evolving High Availability at Braintree
Braintree is a payment gateway: we process payments on behalf of other businesses. Because downtime directly costs our customers money, one of our highest priorities is keeping our API up at all times, so that none of our customers ever misses a payment. At Surge 2013, my colleague Paul Gross gave a talk titled "High Availability at Braintree" in which he detailed the techniques and strategies we use to keep our API up. In this talk, I'll examine what happened as our traffic exploded in the last 2 years: what continued to work, what didn't, and what lessons we learned.

In particular, our infrastructure has changed dramatically. As our traffic has grown, many of the tools that worked well at lower volumes became liabilities, and we've had to change things up. One particular tool I'll focus on is the Broxy, a custom web server we wrote in 2011 and recently phased out due to continuous operational problems. I'll also examine changes to our outbound HTTP stack, monitoring and logging tools, and others components.
Matt Ranney
UBER
Senior Staff Engineer
Designing for failure: Scaling Uber's backend by breaking everything.
THURSDAY, 11:30 AM
In Cherry Blossom | SCALING ARCHITECTURE

Matt Ranney
UBER • Senior Staff Engineer
@mranney
Matt is a Senior Staff Engineer at Uber where he works on architecture and distributed systems. Before Uber, he was co-founder and CTO of Voxer.
Designing for failure: Scaling Uber's backend by breaking everything.
As Uber scales its business to new products in new cities, the requirements for high availability and scalability increase. As the engineering team scales, doubling this year alone, the challenges of building a reliable system grow with it. At our current scale, even brief outages in the service are very costly, both in dollars to the company and with real world impact on people's lives.

To get better at handling failure and design for it, we've had to make failures more common. Every new system that we build is subjected to regular failure testing, even databases. This requires some new technology choices from the more comfortable ones that worked when we were smaller.

The shift from a smaller service with a few hardened components to a global operation with hundreds of services is as much cultural as it is technical. This talk will cover the Uber architecture and how it handles every failure we can think of. It'll also cover some real outages and how they've influenced our new design.
Mikhail Panchenko
Opsmatic
CTO
The Outage We Had During Surge 2014
THURSDAY, 11:30 AM
In Cherry Blossom | SCALING ARCHITECTURE

Mikhail Panchenko
Opsmatic • CTO
@mihasya
Mike is the CTO of Opsmatic. He works with his team on building a reliable service for making the daily lives of frontline operators and engineers better. He came to this gig after experiencing much pain during his own daily life, building and operating a few large-scale infrastructures at Urban Airship, SimpleGeo, Flickr, and Yahoo.
The Outage We Had During Surge 2014
For as long as it's existed, Surge has been a place to share your horror outage stories. This year my horror story comes with a bonus: the outage I'd like to talk about happened DURING Surge 2014! It was a spectacularly complex, cascading failure replete with hidden interactions, a time bomb-like build up, false assumptions, and misguided response. An epic fail. A comedy of errors. A collossal blunder.

As we traverse the chain of events that caused me to miss most of the Thursday talks last year, we'll dig into the decisions and aspirations that lead us down the seemingly safe path that ended in a fireball.

We will touch on the following themes, and more!
  • Automation. How much is too much?
  • Complex Systems - an olde tyme crowd pleaser
  • Monitoring and forensic - a brief portion of the talk wherein I'll allow myself to talk about how we used our product to debug an outage in our product.
Nathan Taylor
Fastly
Software engineer
Racing to Win: Race Conditions in Correct Protocols
THURSDAY, 11:30 AM
In Cherry Blossom | SCALING ARCHITECTURE

Nathan Taylor
Fastly • Software engineer
@dijkstracula

Nathan Taylor is an Oakland-based software developer currently employed at Fastly , where he works on making the Web faster through high performance content delivery. Previous gigs have included hacking on low-level systems software such as Java runtimes at Twitter and, prior to that, the Xen virtual machine monitor in grad school. Originally a Trombone major, he holds an Msc. from UBC, where he researched full-system dynamic analysis and binary rewriting tools. When not in front of a computer, Nathan is likely to either be making granola or suffering up a steep hill on his road bike.
Racing to Win: Race Conditions in Correct Protocols
If you've ever worked on parallel or multiprocessor software, you've almost certainly encountered bugs owning to race conditions between concurrently-executing components. While race conditions intuitively seem bad, it turns out there are cases in which we can use them to our advantage! In this talk, we'll discuss a number of ways that race conditions are used in improving throughput and reducing latency in high-performance systems, without sacrificing correctness along the way.

We'll begin this exploration with a discussion of how various mutual exclusion primitives like locks are implemented efficiently in modern hardware using benign race conditions. From there, we'll investigate how one can implement non-blocking algorithms and concurrent data structures in a correct and deterministic manner using freely-available open source libraries.
Patrick Meenan
Google
Software Engineer
Web App Performance Measurement, Monitoring and Resiliency
THURSDAY, 11:30 AM
In Cherry Blossom | SCALING ARCHITECTURE

Patrick Meenan
Google • Software Engineer
@patmeenan
Patrick Meenan has been working on web performance in one form or another for the last 14 years and is currently working on Chrome performance at Google. He created the popular open-source WebPagetest web performance measurement tool, runs the free instance of it at http://www.webpagetest.org and can frequently be found in the forums helping site owners understand and improve their website performance.
Web App Performance Measurement, Monitoring and Resiliency
Recent advancements in web browsers expose the ability to measure and report on all sorts of things that used to only be available through testing tools. We will explore what the Navigation Timing, Resource Timing, Server Timing and Network Error Logging interfaces bring to the table, some techniques to get the most out of them and issues with typical performance testing.

We will also dive into the new "Service Workers" capabilities and explore some really crazy things that can be done around measurement, fail-over and failure prevention, all within the browser.
Paul Khuong
AppNexus
Team Lead of Publisher - Engineering
Greenspunning for fun, boredom and profit
THURSDAY, 11:30 AM
In Cherry Blossom | SCALING ARCHITECTURE

Paul Khuong
AppNexus • Team Lead of Publisher - Engineering
@pkhuong

Paul Khuong leads a small team of developers at AppNexus . In a past life, he worked on Steel Bank Common Lisp until his funding ran out. He then promptly obtained a PhD for a side project in mathematical optimisation and decided to join AppNexus in order to hack in C and x86-64 assembly, the polar opposites of Lisp. Obviously, mistakes were made.
Greenspunning for fun, boredom and profit
Philip Greenspun famously observed that "Any sufficiently complicated C or Fortran program contains an ad hoc informally-specified, bug-ridden, slow implementation of half of Common Lisp." This is the story of how we, at AppNexus, are slowly proving Greenspun half right--hopefully for the better.

The applications I work on are almost entirely written in multithreaded C, mostly for performance reasons. This decision makes development, hiring, and on-boarding more exciting and difficult than they should be. Over time, we've adopted static type checking hacks, weak forms of aspect-oriented programming and of exception handling and crash recovery, and multiversion concurrency control. Thanks to these improvements, we detected several three-star programming bugs at compile-time, traced through and understood rare erroneous states that we could not reproduce in controlled environments, mitigated several outages, and outright ruled out a few classes of locking bugs.

I'll discuss how we managed to evolve a (mostly) working system to incrementally improve reliability and approachability without ever attempting a complete re-implementation. However, I will also argue that it sometimes makes more sense to attach guardrails to C than to work in a mainstream "safe" language, regardless of history; in fact, doing so selectively can be a boon not only to development velocity and on-boarding, but also to reliability and runtime performance and consistency... particularly for distributed systems.
Riley Berton
AppNexus
Principal Engineer
Towards Real Time Big Data
THURSDAY, 11:30 AM
In Cherry Blossom | SCALING ARCHITECTURE

Riley Berton
AppNexus • Principal Engineer
@rileyberton
Riley Berton is a Principal Engineer at AppNexus working on data streaming and processing. He has over 15 years of experience working on distributed and large scale systems. Prior to AppNexus he designed and built the audio verification system for Viggle, worked on embedded public safety equipment at Eventide, and annoyed lawyers everywhere by creating one of the first electronic legal bill review systems. Nowadays he makes things faster and scalier in streaming data and complex event processing.
Towards Real Time Big Data
Many companies operate Big Data technologies without actually having a Big Data problem. At AppNexus we deal with 175TB of data every calendar day which must be deduped, joined, aggregated and moved around.

This talk will cover:
  • The history of large data processing at AppNexus from mysql to Netezza to Hadoop to Kafka and custom CEPs as well as lessons along the way
  • How we process and handle 175TB daily and 1.6BN rows per minute while meeting (and improving) SLAs
  • How we stream this data around the world
  • Where the dividing line is between batch oriented and real time processing
  • The migration path from structured data/unstructured format to structured data/structured format and how we handle these migration paths
  • Ephemeral streams vs. durable streams (AKA, if it doesn't have to hit disk, don't hit disk)
  • At what scale are custom join/agg/dedupe processes necessary WRT hardware cost and generic solutions
Ryan Roemer
Formidable Labs
CTO and co-founder
Wrangling Large Scale Frontend Web Applications
THURSDAY, 11:30 AM
In Cherry Blossom | SCALING ARCHITECTURE

Ryan Roemer
Formidable Labs • CTO and co-founder
@ryan_roemer

Ryan is the CTO and co-founder of Formidable Labs , a boutique development shop in (the delightfully weird neighborhood of) Fremont in Seattle, WA. He helps lead the Seattle Node.js Meetup, curates the Server Day for Cascadia JSFest 2015, and is the author of “Backbone.js Testing”, a comprehensive test development guide for modern Backbone.js web applications.

Ryan architects full-stack applications and backend Node.js services, and leads frontend development groups ranging from small startups to Fortune 500 engineering teams. Previously, Ryan was a distributed systems engineer, and in his deep, dark past was a patent attorney, although it has been a long time since he has put on his “lawyer” hat.
Wrangling Large Scale Frontend Web Applications
Web applications are massively shifting to the frontend, thanks to exciting new JavaScript / CSS technologies, expanding browser capabilities (visualizations, real-time apps, etc.) and faster perceived user experiences. However, client web applications can be a nightmare to maintain at scale, even for seasoned software architects and operations engineers. Deployment and production infrastructures are complex and rapidly changing. And, frontend JavaScript / CSS code ships to browsers worldwide, where errors and issues are notoriously difficult to systematically detect and diagnose.

In this talk, we will tackle the wild west of the frontend with pragmatic steps and seasoned advice from helping organizations from startups to Fortune 500 companies create some of the largest frontend web applications on the Internet.
Some of the topics we will cover include:
  • Managing and building very large (500K+ line) frontend application / test code bases.
  • Surviving production traffic and errors on the frontend and handling spikes like Black Friday / Cyber Monday for one of the highest traffic e-commerce websites in existence.
  • How, where, and why your frontend application is likely to fail.
  • Monitoring, logging, and debugging frontend web applications out in the wild.
  • Automating checks, tests, and code introspection to protect your code in production.
  • Creating an effective, fast, and engineer-friendly development-test-deployment frontend pipeline.
Slade Mitchell
Comcast
VP, Solution Architecture
Surviving Partial Failure in a Microservices Jungle
THURSDAY, 11:30 AM
In Cherry Blossom | SCALING ARCHITECTURE

Slade Mitchell
Comcast • VP, Solution Architecture
@sldaemitchell

Slade is the architecture lead within the advanced application development organization that designed and built the Xfinity X1 TV experience for Comcast customers. His interests and focus include the development and deployment of highly reliable and available distributed solutions at large scale.
Surviving Partial Failure in a Microservices Jungle
Comcast's TV products serve tens of millions of customers and are powered by a suite of dozens of services that are continuously developed and operated by hundreds of technical staff. While we have enjoyed many of the touted benefits of a microservice architecture-looser coupling between teams, independent deployments-we have also encountered the corresponding challenges. In particular, we've learned that operating a platform composed of this many services in a reliable fashion is fraught with peril. Delivering business value can seem like hacking your way through the wilderness at times.

In this talk, we'll start by briefly reviewing some "ancient" (20 years old!) literature: "partial failures" make distributed systems fundamentally different. When an application can have some parts fail while other parts continue working, it can be difficult to reason about overall correctness.

We'll go over three main strategies for surviving partial failures:
  • using idempotent service interfaces
  • placing service boundaries between optional or less-critical functionality
  • recombining services
Each "survival tip" will be explained through a concrete example from our experience. By the end of the talk, attendees will be familiar with the pitfalls of partial failures--the main technical weakness of a microservice architecture-but will also be armed with techniques to successfully avoid them.
Eric Sproul
Circonus
Release Engineer
SaaS for the Enterprise - Automated, cross-platform software and service delivery
THURSDAY, 11:30 AM
In Cherry Blossom | SCALING ARCHITECTURE

Eric Sproul
Circonus • Release Engineer

As Release Engineer for Circonus , Eric is responsible for software packaging and deployment automation for SaaS and on-premise installations. He designs and maintains continuous delivery and continuous integration processes to achieve the highest quality experience for administrators deploying Circonus. He has extensive experience in IT Operations, data center management and site reliability engineering from his work with ISPs and at OmniTI, where he lead teams that supported some of the largest infrastructures in the world.
SaaS for the Enterprise - Automated, cross-platform software and service delivery
Circonus is a comprehensive monitoring and analytics platform, available as public SaaS to anyone or as private SaaS for enterprises with on-premise requirements. The platform is available on OmniOS or Linux, using a complex stack of hundreds of individual software packages. Managing and packaging Circonus for continuous delivery require extensive automation, rigid control over proprietary and open-source software components, and creative use of configuration management tools to consistently deliver software across multiple platforms.

Eric will share the processes and tools used for continuous delivery and continuous integration at Circonus - including the advanced package dependency features available in OmniOS, automated package creation and repository management, and more.
Ian Evans
Carpathia
Cloud Architect
Today’s Balancing Act: How to Increase Security While Simultaneously Improving Application Performance & Availability
THURSDAY, 11:30 AM
In Cherry Blossom | SCALING ARCHITECTURE

Ian Evans
Carpathia • Cloud Architect

Ian Evans is an accomplished Cloud Architect with more than 17 years of systems, network and infrastructure planning experience spanning multiple industries including Healthcare, Oil and Gas, Hospitality and the Federal Government. In his current role as a Cloud Architect at Carpathia , Ian uses his deep understanding of cloud backbone systems, processes, management tools, and techniques to help customers construct, modify, operate, and maintain geographically disperse cloud architectures that meet their unique needs. Prior to joining Carpathia, Ian held multiple engineering and product/account management positions at Verizon, Amazon Web Services and Systems Technology Forum, Ltd. Ian holds a bachelor's degree in business information management from Bellevue University.
Today’s Balancing Act: How to Increase Security While Simultaneously Improving Application Performance & Availability
Enterprises in today’s digital ecosystem face a tough balancing act. On one hand, they must meet the demands of a global audience by delivering a high-quality online experience anywhere, any time, and on any device. But on the other hand, sophisticated hackers threaten to wreak havoc on IT systems at any moment and cause consumers to lose valuable trust. This places an increased burden on digital media providers, as they must spend time ensuring their data and applications are properly secured instead of on core business activities.
This presentation will cover:
  • The impacts sophisticated hackers and compliance regulations such as PCI DSS and SOX have on today’s digital media providers
  • How a hybrid cloud approach better balances security and compliance with the delivery of high performance applications and services
  • How trends like containers, micro services, and real-time big data will continue to shape this complex environment
  • How cost effective cloud commodity hardware solutions are driving down cost without sacrificing innovation or performance
  • Lessons learned from similar businesses in the industry
Nathen Harvey
Chef
Community Director
Chef Training
THURSDAY, 11:30 AM
In Cherry Blossom | SCALING ARCHITECTURE

Nathen Harvey
Chef • Community Director
@nathenharvey

As the Community Director at Chef , Nathen helps the community whip up an awesome ecosystem built around the Chef platform. Nathen also spends much of his time helping people learn about the practices, processes, and technologies that support DevOps, Continuous Delivery, and high velocity organizations. Prior to joining Chef, Nathen spent a number of years managing operations and infrastructure for a diverse range of web applications. Nathen is a co-host of the Food Fight Show, a podcast about Chef and DevOps. Nathen helps organize ChefConf, the Chef Community Summit, DevOpsDays DC, and the DevOpsDC meetup group.
Chef Training
I'll be leading the tutorial on Chef and we're going to have a lot of fun and learn a thing or two along the way.

We'll talk about infrastructure as code, devops, continuous delivery, and compliance. There will be lots of hands-on time and we need to be sure everyone comes prepared and ready to dive right in. Review the prerequisites on GitHub to make sure your ready to go.

The most important prerequisites
What questions can I answer for you?
Rajiv Kurian
SignalFX
Software Engineer
Scaling Ingest Pipelines with High Performance Computing Principles
THURSDAY, 11:30 AM
In Cherry Blossom | SCALING ARCHITECTURE

Rajiv Kurian
SignalFX • Software Engineer

Rajiv Kurian is a software engineer with over 5 years experience building high performance distributed systems. He has worked on database engines, networking protocols and image processing. At SignalFx , Rajiv works on improving the performance of the ingest pipeline.
Scaling Ingest Pipelines with High Performance Computing Principles.
At SignalFx, we deal with high-volume high-resolution data from users for production monitoring use cases. This requires a high performance ingest pipeline. Over time we’ve found that we needed to adapt architectural principles from specialized fields such as HPC to get beyond performance plateaus encountered with more generic approaches.

Some key examples include:
  • Use compact, array based data structures with minimal indirection, instead of pointer based data structures that encourage heap allocation.
  • Write simple single threaded code, instead of complex algorithms. Parallelize by running multiple copies of simple single threaded code, instead of using concurrent algorithms.
  • Separate the data plane from the control plane and do not slow the data plane because of control design.
  • Apply smart batching and compact encoding to IO workloads.
This presentation will talk about the performance plateaus we have faced and provide several examples of putting such principles into practice. It will also show before/after results we’ve experienced in the performance of our own services. We believe these lessons will be useful to anyone.

Where?

Surge 2015 will be hosted at the Gaylord National Resort in National Harbor, MD.

Gaylord National Resort is the cornerstone of the 300 acre, National Harbor waterfront entertainment district, located 8 miles south of Washington DC. The resort offers visitors fine dining and casual restaurants, unique shopping experiences, an indoor pool, and 20,000 square foot spa and fitness center. And, for late-night excitement, an express elevator speeds you to the two-story, rooftop, Pose Ultra-Lounge — site of 2015's Surge Party, sponsored by Carpathia.

Gaylord National Resort & Convention Center is now accepting hotel reservations for Surge 2015. Through their website you can book, modify, or cancel your hotel reservations at any time.

While you Are Here

Give Your Getaway Some Character with The DreamWorks Experience at Gaylord National! Grab the family and make your Surge attendance a getaway, where you can experience new and exciting interactive adventures with the beloved characters from such DreamWorks Animation films as Shrek, Kung Fu Panda and Madagascar.


Find more things to do during your stay, check out Gaylord National Entertainment Guide.

Getting to Gaylord National

Gaylord National is just a 15-minute drive from Reagan National Airport (DCA) and a 45-minute drive from both Dulles (IAD) and Baltimore Washington International (BWI) Airports. Shuttle and car service are available from either airport to the hotel.

Convenient to D.C., Maryland and Virginia

Gaylord National is just 15 minutes due south of Washington, D.C. — right off the Capital Beltway (I-95/I-495) — making it an easy drive from the surrounding Maryland and Virginia areas as well.

Friendly Space Policy

OmniTI is dedicated to providing a harassment-free conference experience for everyone. Harassment may include but is not limited to offensive verbal comments, sexual images in public spaces, deliberate intimidation, stalking, unwelcome following, harassing photography or recording, sustained disruption of talks or other events, inappropriate physical contact, and unwelcome sexual attention. Participants asked to stop any harassing behavior are expected to comply immediately. If a participant engages in harassing behavior, the conference organizers may take any action they deem appropriate, including warning the offender or expulsion from the conference.

If you are being harassed, notice that someone else is being harassed, or have any other concerns, please contact a member of conference staff immediately; conference staff will be happy to provide escorts or otherwise assist attendees to feel safe for the duration of the conference. We expect participants to follow these rules at all conference venues and conference-related social events. Conference staff can be identified by special badges.

Register for Surge 2015

img

ADD TO YOUR HOMESCREEN

Add Surge to your homescreen
for an enhanced experience!