Kafka1.0生态系统

本文给出列出了kafka官方wiki文档中声称已经集成了kafka的各种工具,组成了kafka的生态系统。但是这些工具集成kafka效果如何本文未作评估。


0.Kafka Streams

更新:2016年6月2日

Kafka Streams在几个月前由Confluent Platform首先在其平台的技术预览中行提出,目前已经在Apache Kafka 0.10.0.0上可用了。

Kafka Streams其实是一套类库,它使得Apache Kafka可以拥有流处理的能力。Kafka Streams包含了一整套描述常见流操作的高级语言API(比如 joining, filtering以及aggregating records),这使得开发者可以快速开发强大的流处理应用程序。Kafka Streams提供了状态和无状态的处理能力,并且可以部署在很多系统之上: Kafka Streams应用程序可以运行在YARN、Mesos、Docker containers上,甚至直接嵌入到现有的Java应用程序中。


官网最后更新:2016年4月

1.数据导入导出工具(Kafka Connect)

Kafka Connect是Kafka 0.9+的一个自带工具,通过connectors可以将大数据从其它系统导入到Kafka中,也可以从Kafka中导出到其它系统。 详情可以参考Kafka Connect的官方文档和我的文章。

2.商业平台的kafka集成包(Distributions & Packaging)

3.与流处理框架集成(Stream Processing)

  • Kafka Streams Built-in library as part of the Kafka project.
  • Storm A stream-processing framework.
  • Samza A YARN-based stream processing framework.
  • Storm Spout Consume messages from Kafka and emit as Storm tuples
  • Kafka-Storm Kafka 0.8, Storm 0.9, Avro integration
  • SparkStreaming Kafka receiver supports Kafka 0.8 and above
  • Flink Apache Flink has an integration with Kafka
  • IBM Streams A stream processing framework with Kafka source and sink to consume and produce Kafka messages

4.与Hadoop集成(Integration)

  • Kafka Connect sink A sink for Kafka’s connector framework.
  • Camus LinkedIn’s Kafka=>HDFS pipeline. This one is used for all data at LinkedIn, and works great.
  • Kafka Hadoop Loader A different take on Hadoop loading functionality from what is included in the main distribution.
  • Flume Contains Kafka Source (consumer) and Sink (producer)
  • KaBoom A high-performance HDFS data loader

5.与搜索查询系统集成(Search and Query)

  • ElasticSearch - This project, Kafka Standalone Consumer will read the messages from Kafka, processes and index them in ElasticSearch. There are also several Kafka Connect connectors for ElasticSeach.
    Presto The Presto Kafka connector allows you to query Kafka in SQL using Presto.
    Hive Hive SerDe that allows querying Kafka (Avro only for now) using Hive SQL

6.kafka的监控管理(Management Consoles)

  • Kafka Manager - A tool for managing Apache Kafka.
  • kafkat - Simplified command-line administration for Kafka brokers.
  • Kafka Web Console- Displays information about your Kafka cluster including which nodes are up and what topics they host data for.
  • Kafka Offset Monitor - Displays the state of all consumers and how far behind the head of the stream they are.
  • Capillary – Displays the state and deltas of Kafka-based Apache Storm topologies. Supports Kafka >= 0.8. It also provides an API for fetching this information for monitoring purposes.

7.AWS Integratio

  • Automated AWS deployment
  • Kafka -> S3 Mirroring tool from Pinterest.
  • Alternative Kafka->S3 Mirroring tool

8.Logging

  • syslog (1M)
  • syslog producer : A producer that supports both raw data and protobuf with meta data for - deep analytics usage.
  • syslog-ng (https://syslog-ng.org/) is one of the most widely used open source log collection tools, capable of filtering, classifying, parsing log data and forwarding it to a wide variety of destinations. Kafka is a first-class destination in the syslog-ng tool; details on the integration can be found at https://czanik.blogs.balabit.com/2015/11/kafka-and-syslog-ng/ .
  • klogd A python syslog publisher
  • klogd2 A java syslog publisher
  • Tail2Kafka A simple log tailing utility
  • Fluentd plugin Integration with Fluentd
  • Remote log viewer
  • LogStash integration Integration with LogStash and Fluentd
  • Syslog Collector written in Go
  • Klogger A simple proxy service for Kafka.
  • fuse-kafka: A file system logging agent based on Kafka
  • omkafka: Another syslog integration, this one in C and uses librdkafka library
  • logkafka Collect logs and send lines to Apache Kafka

  • Flume - Kafka plugins

    • Flume Kafka Plugin - Integration with Flume
    • Kafka as a sink and source in Flume - Integration with Flume

9.Metrics

  • Mozilla Metrics Service - A Kafka and Protocol Buffers based metrics and logging system
  • Ganglia Integration
  • SPM for Kafka
  • Coda Hale Metric Reporter to Kafka

10.打包和部署(Packing and Deployment)

11.Kafka Camel Integration

12.其他杂项(Misc.)

  • Kafka Websocket - A proxy that interoperates with websockets for delivering Kafka data to browsers.
  • KafkaCat - A native, command line producer and consumer.
  • Kafka Mirror - An alternative to the built-in mirroring tool
  • Ruby Demo App
  • Apache Camel Integration
  • Infobright integration
  • Riemann Consumer of Metrics
  • stormkafkamom – curses-based tool which displays state of Apache Storm based Kafka consumers (Kafka 0.7 only).

当前网速较慢或者你使用的浏览器不支持博客特定功能,请尝试刷新或换用Chrome、Firefox等现代浏览器