The 7 Common Pitfalls of Big Data


If you (and your organization) feel like you're being left off the big data train as it's pulling out of the station, then you can try to quickly create a big data plan to get back on track. However, doing so in haste and without the right resources can lead to the same mistakes that others have encountered when starting a big data engagement.

In order to avoid future big data mistakes, let's take a look at these seven most common pitfalls that you may encounter:

Not Knowing How to Measure Success

The key to any big data project is to have goal driven metrics. Many companies are still struggling to be able to find actionable insights from data because they do not establish clear goals from the start. While many metrics are fascinating to engineers, they have little to no value to the executive team. Executive metrics are often revenue or business-driven, so your big data analysis and visualization should have that in mind.

Lack of an Executive Sponsor

You need at least one major influencer on board from the start of the project that can respond to pushback and champion the endeavor forward. Although it may not be the day to day activities as a project manager, an Executive Sponsor continues to stay engaged with the big data project team and ensures that the project stays on track.

Failure to Focus on Specific Metrics

You need to come up with 10 to 30 specific metrics that provide real measurements to the business. Deciding on questions that focus on what your organization wants to answer is going to be the key. Don't try to measure everything, as you'll overwhelm others that are looking at the data only periodically or even for the first time. They will not see the value in the reporting.

Failing to Come up with a Data Security Plan

As you start to aggregate your data, you need to have security first and forefront in your mind. Answering some of the following simple questions can prevent a lot of headaches later:

  • Who needs access to the data?
  • Is there any personal information contained in the data?
  • If the data is anonymized would the value be lost?
  • Should the data be encrypted? How much processing time will that add?
  • Can we encrypt at all stages (In Flight, At Rest, In Motion, In Use)?
  • Are we logging who accesses the data?
  • Are there separate logins created and not just one master login?

NOTE: This is not a full Security plan questionnaire for big data, simply questions to get you thinking

Failure to Come up with a Data Archival Plan

When looking at your big data project holistically you need to factor in considerations such as Total Cost of Ownership (TCO). Data that is always accessible, even on cloud storage solutions, can be more expensive to maintain versus infrequently accessed data. If you are only looking at a particular month or quarter, there is no need to have the full year or all data stored on the top grade SSD drives.

Load your data as your business case requires; this will also allow queries to perform faster enabling you to get more value from your workforce. Your resources won’t constantly wait for slow running jobs that are sifting and filtering through data that is not going to be shown in the results anyways.

Asking the Wrong Questions

Remember the old computational adage GIGO (Garbage In Garbage Out)? If you ask the wrong questions of the data, then you will come up with potentially meaningless outputs and your project will be seen as of little or no value.

The data needs to be cleansed and then queried to return the answers to specific business questions that are predefined at the start of the project. This will provide the maximum value to one or more organizational units.

Focusing on Technology instead of Business Need

This is the point where the question is answered, and the executive sponsor comes into play. Technologists can quickly get lost in all of the new technologies and features which are evolving in the data science space.

The team lead / Executive Sponsor will know what the business is looking to accomplish from the metrics and KPIs agreed on from the start. Their goal is to help focus the team when the next shiny new piece of technology comes around by asking key questions.

Consider the following:

    1. How much time would a refactor take vs. how much processing time would this new technology save us?
    2. Is this a cost saving or a capital investment for this new technology?

These common pitfalls are obviously not a comprehensive list of challenges with big data. There are countless more that could have a negative impact on your big data investment. However, big data can be a very rewarding endeavor for your company if it's done right. Make sure you avoid these common pitfalls when you're thinking about a big data plan or actively executing one. If you need assistance with a big data roadmap and architecture plan, please feel free to reach out to us for more information!

Always be in the Know, Subscribe to the Relus Cloud Blog!