The Apache Software Foundation Announces Apache® Beam™ v2.0.0

The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today the availability of Apache® Beam™ v2.0.0, the first stable release of the unified programming model for both batch and streaming Big Data processing.

An Apache Top-Level Project (TLP) since December 2016, Beam includes Java and Python software development kits used to define data processing pipelines and runners to execute them on Apache Apex, Apache Flink, Apache Spark, and Google Cloud Dataflow, among other execution engines.

Apache Beam has its roots in Google’s internal work on data processing over the last decade, evolving from the initial MapReduce system, through FlumeJava and MillWheel, into Google Cloud Dataflow v1.x, which defined the unified programming model that became the heart of Apache Beam.

“The first stable release is an important milestone for the Apache Beam community,” said Davor Bonaci, Vice President of Apache Beam. “This is a statement from the community that it intends to maintain API stability with all releases for the foreseeable future, making Beam suitable for enterprise deployment.”

Apache Beam v2.0.0 improves user experience across the project, focusing on seamless portability across execution environments, including engines, operating systems, on-premise clusters, cloud providers, and data storage systems. Other highlights include:

  • API stability and future compatibility within this major version;
  • Stateful data processing paradigms that unlock efficient, data-dependent computations;
  • Support for user-extensible file systems, with built-in support for Hadoop Distributed File System, among others; and
  • A metrics subsystem for deeper insight into pipeline execution.

Apache Beam is in use at Google Cloud, PayPal, and Talend, among others.

“Apache Beam is a mature data processing API for the enterprise, with powerful semantics that solve real-world challenges of stream processing,” said Tomer Pilossof, Big Data Manager at PayPal. “With Beam, we provide data processing solutions for a wide range of customers within the PayPal organization.”

“We at Talend are thrilled to have contributed to Apache Beam reaching the 2.0.0 milestone and its first official stable release,” said Laurent Bride, Chief Technology Officer at Talend. “Apache Beam is now part of the foundation of Talend products. Recently, we released Talend Data Preparation for Big Data which leverages Beam to create transformation pipelines that are portable across many execution engines. Later this year, we plan to deliver Talend Data Streams, taking the Apache Beam integration one step further by utilizing its powerful streaming semantics. Whether for batch, streaming, or real-time use cases, Apache Beam is a powerful framework that delivers the flexibility and advanced functionality our customers need.”

“We congratulate the Apache Beam community for reaching the key milestone of a first stable release,” said William Vambenepe, Lead Product Manager for Big Data, Google Cloud. “We look forward to our Google Cloud Dataflow customers taking full advantage of Beam’s powerful programming model and newest features to run their data processing pipelines on Google Cloud.”

Apache Beam v2.0.0 is making its debut at Apache: Big Data, taking place this week in Miami, FL, with four sessions featuring Apache Beam. Apache Beam will also be highlighted at numerous face-to-face meetups and conferences, including the Future of Data San Jose meetup, Strata Data Conference London, Berlin Buzzwords, and DataWorks Summit San Jose.