Brandseye

Company: Brandseye

Industry: Apparel

Functional Use Case: Data Store

Womply
Womply is a next generation payments data company. We partner with merchant-facing companies including credit card processors and acquirers to analyze data for millions of merchants across the US.
My role at Womply is that of a lead developer who gets to work on a wide variety of technologies. On any given day I might be automating our VM provisioning with Chef, writing an ETL process in Java/Scala or implementing some new feature using Ruby on Rails.

Cassandra and Spark
We use Cassandra for dashboards that allow merchants to analyze their revenue and compare it against merchant aggregates in their area and/or vertical.
We evaluated HBase pretty heavily but found that the operational demands imposed by it were much greater than Cassandra.
We are currently using Cassandra 2.0.8 in combination with Apache Spark 1.0. The revenue data we collect gets stored directly in Cassandra. From there we use Apache Spark to precompute and persist to Cassandra several time series aggregates with partition keys like category, city/state and nearest merchants.
Our revenue data needs to be aggregated into several different kinds of time series which is an excellent fit for Cassandra.

Top Benefits
Cassandra was a good fit with our product requirements; storing time series data is one of Cassandra’s sweet spots and is a core feature of our product.
It’s also relatively easy to manage from an operations perspective. On top of that, having no single point of failure (SPOF) is great, and it is easy to scale up/down. The Chef cookbook support is also beneficial.
Lastly, Cassandra is very affordable. Our cluster uses entirely open source software so there are no licensing fees to pay.

Deployment
We have one Cassandra data center that also runs Spark locally on each node. Spark allows you to set max core and RAM usage thresholds so we’ve set max cores to half of the virtual cores on each EC2 instance. This has allowed us to run Spark jobs on the same nodes that also serve real-time queries from our web application with minimal performance degradation.

Advice
If you need to run massive parallel processing jobs consider using Spark instead of map reduce. Spark will allow you to write jobs more quickly that will run much faster. Also getting it installed and integrated with Cassandra is much easier/cheaper than Hadoop.

Community
Our experience with the community has been really good. While setting up our initial proof of concept I found a bug in the Cassandra Hadoop input format. The patch I submitted was merged up to trunk within 24 hours and was included in a release a few days after that. We’ve also had similarly good experiences with other open source projects in the Cassandra orbit like Calliope and the Cassandra Chef cookbook.

Stack Includes: Apache Cassandra, Java, Spring Boot

Want to share your use case?

Planet Cassandra is the home page for the Cassandra Community, where everyone in the community can share their use cases.

Show off what you've done & help others learn following your example & contribution.

Become part of our

growing community!

Planet Cassandra is a service for the Apache Cassandra® user community to share with each other. From tutorials and guides, to discussions and updates, we're here to help you get the most out of Cassandra. Connect with us and become part of our growing community today.

© 2009-2023 The Apache Software Foundation under the terms of the Apache License 2.0. Apache, the Apache feather logo, Apache Cassandra, Cassandra, and the Cassandra logo, are either registered trademarks or trademarks of The Apache Software Foundation.

Get Involved with Planet Cassandra!

We believe that the power of the Planet Cassandra community lies in the contributions of its members. Do you have content, articles, videos, or use cases you want to share with the world?