Apache Cassandra Lunch #68: DataStax Apache Kafka Connector

6/25/2022

Reading time:3

Apache Cassandra Lunch #68: DataStax Apache Kafka Connector - Business Platform Team

This resource is based on an article originally published here.

In Apache Cassandra Lunch #68: DataStax Apache Kafka Connector, we introduce the DataStax Apache Kafka Connector and discuss how we can use it to connect Apache Kafka and Cassandra. The live recording of Cassandra Lunch, which includes a more in-depth discussion and a demo, is embedded below in case you were not able to attend live. If you would like to attend Apache Cassandra Lunch live, it is hosted every Wednesday at 12 PM EST. Register here now!

In Apache Cassandra Lunch #68: DataStax Apache Kafka Connector, we introduce theDataStax Apache Kafka Connectorand discuss how we can use it to connectApache KafkaandCassandra. In the video recording embedded below, we go through some basic information regarding the connector, basic architecture of how it works, and also go through a simple Katacoda example from DataStax to show you how to use the connector. Additionally, we also discuss how we use the DataStax Apache Kafka Connector in ourCassandra.Realtimerepo, so be sure to check out the embedded video below!

The DataStax Apache Kafka Connector is open source software that works with the Kafka Connect framework. It synchronizes records from a Kafka topic with table rows in the following supported databases: DataStax Astra cloud databases, DataStax Enterprise (DSE) 4.7 and later databases, and Open source Apache Cassandra® 2.1 and later databases. The connector gets deployed on the Kafka Connect Worker nodes and runs within the worker JVM. The connector Workers running one or more instances of the DataStax Kafka Connector pull messages from Kafka topics and write them to a database table on the DataStax platform using the DataStax Enterprise Java driver.

Each instance of the DataStax Apache Kafka Connector creates a single session with the cluster.
- A single connector instance can process records from multiple Kafka topics and write to several database tables.
Data is pulled from the Kafka topic and written to the mapped table using a CQL batch that contains multiple write statements.
A map specification binds a Kafka topic field to a table column.
- Fields that are omitted from the specification are not included in the write request.
- Fields with null values are written to the database as UNSET (see nullToUnset).
- To ensure proper ordering, all records are written using the Kafka record timestamp.
Use multiple connectors when different global connect settings are required for different scenarios, such as writing to different clusters or datacenters.
The Datastax Connector tasks store the offsets in config.offset.topic.
- In the event of a failure, the DataStax Connector task resumes reading from the last recorded location.
Ingest data from Kafka topics with records in the following data structures:
- Primitive type values, such as integer or string
- Complex field values in record types:
  - JSON formatted string
  - Kafka Struct
  - Avro
Built-in SSL, LDAP/Active Directory, and Kerberos integration
More Features: https://docs.datastax.com/en/kafka/doc/kafka/kafkaFeatures.html

The demo portion of Apache Cassandra Lunch #68: DataStax Apache Kafka Connector is split into two parts as mentioned above. In the first portion, we cover a DataStax Katacoda Scenario in which we create a Kafka topic, configure and start a Kafka Connect Worker, download and configure the DataStax Kafka Connector, and push data from the topic in Kafka to a Cassandra instance. In the second portion of the demo, we take a look at

Cassandra.Realtimeand discuss how that walkthrough uses the same basics we covered in the Katacoda scenario. If you want a more in-depth discussion and video demo, be sure to watch the embedded Youtube video below!

Resources

Cassandra.Link

Cassandra.Link is a knowledge base that we created for all things Apache Cassandra. Our goal with Cassandra.Link was to not only fill the gap of Planet Cassandra but to bring the Cassandra community together. Feel free to reach out if you wish to collaborate with us on this project in any capacity.

We are a technology company that specializes in building business platforms. If you have any questions about the tools discussed in this post or about any of our services, feel free to send us an email!

Posted in Data & Analytics, Events | Comments Off on Apache Cassandra Lunch #68: DataStax Apache Kafka Connector

Related Articles

node

hybrid.cloud

datastax

GitHub - IBM/datastax-cassandra-clickstream: Use DataStax Enterprise built on Apache Cassandra as a clickstream database

12/8/2023

examples

cassandra

datastax

GitHub - datastaxdevs/workshop-betterreads: Clone of Good Reads using Spring and Cassandra

12/2/2023

examples

cassandra

datastax

NoSQL Database Built on Apache Cassandra | DataStax

12/2/2023

examples

cassandra

datastax

DataStax Examples

12/2/2023

web.scraping

scraping

datastax

Build a Website Scraper with Astra DB + Python Examples | DataStax

12/2/2023

python

cassandra

spark

GitHub - andreia-negreira/Data_streaming_project: Data streaming project with robust end-to-end pipeline, combining tools such as Airflow, Kafka, Spark, Cassandra and containerized solution to easy deployment.

12/2/2023

python

cassandra

spark

GitHub - airscholar/e2e-data-engineering: An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All components are containerized with Docker for easy deployment and scalability.

12/2/2023

cassandra

python

kafka

GitHub - princebhatt9588/Stock_Market_Real_Time_Data_Pipeline_Project_with-Apache-Kafka-and-Cassandra: This app utilizes Python, Apache Kafka, and Cassandra to fetch and process real-time stock market data, providing valuable insights for investors and traders.

12/2/2023

datastax

cassandra

langchain

Super Charge AI Assistants with Superagent and DataStax | DataStax

11/30/2023

database

datastax

aws

Getting Started with DataStax Astra DB and Amazon Bedrock | DataStax

11/30/2023

Explore Further

datastax

node

hybrid.cloud

datastax

GitHub - IBM/datastax-cassandra-clickstream: Use DataStax Enterprise built on Apache Cassandra as a clickstream database

12/8/2023

examples

cassandra

datastax

GitHub - datastaxdevs/workshop-betterreads: Clone of Good Reads using Spring and Cassandra

12/2/2023

examples

cassandra

datastax

NoSQL Database Built on Apache Cassandra | DataStax

12/2/2023

examples

cassandra

datastax

DataStax Examples

12/2/2023

cassandra.lunch

stargate

cassandra.lunch

cassandra

Apache Cassandra Lunch #87: Cassandra.api, Astra, and Stargate - Business Platform Team

7/8/2022

cqlsh

cassandra.lunch

cassandra

Apache Cassandra Lunch #77: Connect to DataStax Astra via Standalone CQLSH - Business Platform Team

7/2/2022

datastax

cassandra.basics

cassandra.lunch

Cassandra Lunch #75: Getting Started with DataStax Enterprise (DSE) on Docker - Business Platform Team

6/29/2022

cassandra.basics

cassandra.lunch

cassandra

Cassandra Lunch #70: Basics of Apache Cassandra - Business Platform Team

6/27/2022

cassandra

rest

python

flask

GitHub - rohitsakala/CassandraRestfulAPI: CassandraRestfulAPI project exposes the cassandra data tables with the help of Restful API's. The project follows the standard Restful API rules. This project is developed as Major project of the Cloud Computing course by Team 15. The project is developed using Python Driver provided by Datastax using Flask framework. #IIITHyderabad #CloudComputing #CSE565 #Monsoon16 #SIEL #Cassandra #Flask #RestAPI

12/9/2023