A new era of SQL-development, fueled by a modern data warehouse

Categories: Data Warehouse

SQL development is not a new concept. However, as the data warehousing world shifts into a fast-paced, digital, and agile era, the demands to quickly generate reports and help guide data-driven decisions are constantly increasing. This puts new pressures on the people working behind the scenes to prepare and serve data in a consumable way to a growing audience with various levels of access credentials and technical expertise. It also puts pressure on tooling and technology platforms to enable self-serve BI in an easy, yet secure and controlled way.

Consider the following:

  • More data types to be queried, but increasingly the data resides in separate silos
  • New data types need to be quickly joined with existing data sets
  • Quick changes in business needs require quick changes or fine-tuning of previously run reports or questions
  • The volume of data is now in the petabytes, and businesses require high demand on availability and reliability
  • Success in agility means demands on more people having access to the same insights at the same time
  • An organization with bottlenecks to acquire, prepare, process, and serve data – can lead to important decisions being made with stale data

These trends and demands lead to stress for existing data warehouse solutions – scale, efficiency, security integrations, IT budgets, ease of access. The stress is also reflected by end users and SQL developers on how efficiently they are expected to serve the business.

Cloudera recently launched Cloudera Data Warehouse, a modern data warehousing solution. It is designed and optimized to help with the majority of stress elements outlined above. It also comes out of the box with a widely adopted, easy-to-use SQL Developer Workbench: Hue, that is specifically designed for the needs of the modern knowledge worker:

  • No need to know _where_ the data resides. Hue can help you navigate to the right data no matter if the data is on-prem or in cloud
  • No need to be a 100% expert on your data. Guiding intelligence with Hue can help you confidently choose the right tables, indexes, and files to work with
  • No need to be an SQL backend expert. Hue can suggest optimizations and warn on bad query design, as well as allow the query to execute where and when there is compute available – be it on-prem or in cloud
  • No need to learn a new skill. Hue delivers the same familiar experience – the #1 choice of SQL Workers on Cloudera’s platform – when you transition your data warehousing workloads to or between cloud environments, or back to on-prem again

In the latest release of Cloudera Enterprise (C6) we enabled Hue 4.0 by default, which aims to expedite and simplify the common tasks for the modern SQL user. Here are some highlights:

Data Ingest

Most data is ingested through data engineering pipelines. Cloudera has a number of fully integrated tools such as Sqoop, Flume, Kafka, cloud service options, and optimized partner solutions from Informatica and Streamsets to satisfy our customer’s needs. But for an SQL user, it is also common to have “data laying around” – some flat files on S3, some tables in an external DB. Bringing in tables or files can now easily and in a guided way be done through Hue, which connects to MySQL, S3, ADLS, and other backends to streamline the task of ingesting important additional data sets.

Data Discovery and Exploration

Finding the right data to work with can be daunting for a large organization. We have implemented ways that will expedite data discovery as well as exploration. In the simplified data browser in C6 the end user can easily search for databases, namespaces, tables, views, collections, and even files. You can also easily preview and sample the data to validate you have found the relevant assets to work with.

A unique integration with our Cloudera Navigator tool (which is part of the Cloudera Data Warehousing portfolio and helps with catalogs, lineage, and auditing) allows us to show you the most commonly used tables (crowdsourced) – very helpful information that will give an indication of which table is most likely the one to work with if you are not entirely sure. It has been estimated to shorten the discovery phase by hours.

Once you found your datasets of interest, there are easy ways to tag them, so that you at next visit can find them immediately – the search functionality also covers tags!. This is thanks to another unique to Cloudera integration with our Navigator tool.

Query Design

Our SQL workbench allows users to iteratively design queries. Often SQL users step away or get interrupted and need to revisit the coding at a later time. We provide both the ability to save queries and look at query history, to speed up the process of getting started. This also conveniently serves the SQL user any time a new business question arises as he or she can easily start from an already saved query, or copy parts of previously run queries to kickstart their new query design.

Throughout the query design process, there are helpful tools to assist you. There is the auto-complete functionality that allows you to start typing both commands and table names and dynamically helps with suggestions. It also proposes tables based on frequent use. Auto-complete is something our users can’t live without, as it speeds up their process by estimated 10x.

Optimization

Also, if using Navigator Optimizer (another tool in our Data Warehousing portfolio that allows our customers to quickly assess the potency of SQL statements) and if query history has been configured to be uploaded, this tool can warn on “bad query design” and suggest optimizations for better performance. The proactive SQL assist via Navigator Optimizer adds resiliency to the overall service, as it prevents some memory-intensive queries to occupy resources that could be used for other users and query execution.

Visualization

Whether your result is from our structured query engine (Impala in our case) or our unstructured query engine (Solr in our case), the results can be visualized in the same simple drag and drop dashboard. This allows knowledge workers to quickly and visually share their insights with others.

Sharing

There are simple ways of posting new created views or queries to GitHub to share with a wider audience or directly with other users. The collaboration part is more important than ever where the border between traditional SQL analysts and ML-focused data scientists blur so slightly – the assets need to be shared between these groups and a tool, such as Cloudera’s Knowledge Worker and SQL Developer Workbench enables a seamless transition.

This is the contemporary Modern Data Warehousing trend we are observing amongst our customers and seems to be catching fire. It’s all about self-service and collaboration at multiple levels. We are enabling it at the query level with Hue.

Cloudera’s SQL user experience is based on unique, valuable integrations to provide intuitive and proactive assist throughout the SQL developer’s journey. Our SQL workbench allows users to iteratively design, optimize, and troubleshoot queries and with proactive SQL assist speed up their overall process. Hue, in addition, persists your work, expediting SQL users to start new projects.

Our SQL Developer Workbench that comes with Cloudera’s Modern Data Warehouse is a leading discovery, exploration, development, and collaboration tool. It opens access to data otherwise siloed and locked down in traditional data warehouse systems. The ease of use and seamless value-add integrations makes Hue the number one choice of modern data warehouse workers that seek to draw deeper insights over both structured and unstructured data and that aims to collaborate over Analyst and Data Science organizational borders.

facebooktwittergoogle_pluslinkedinmail

Leave a Reply