Loom provides dynamic dataset management for Hadoop
Data scientists and data engineers access Loom through the Workbench using their standard web browsers. Data scientists can also access Loom through their standard R environment using our RLoom package. Third-party applications and application developers can access Loom through its RESTful API. The Loom server and registry are installed together alongside a Hadoop cluster. Loom supports all major Hadoop distributions.
Loom's Activescan automatically detects, parses, and profiles hdfs files
Loom's Activescan framework automates the bulk of the work of managing datasets as they land in Hadoop. Activescan dynamically monitors the Hadoop file system and registers new files and directories as potential data sources. Activescan's classifier detects the file type, enabling Activescan's formatter to detect and register a schema and an associated file reader. This, in turn, enables users to convert data sources into Loom datasets so that they can be processed further with Loom transformations.
- Automatically detects new files and directories
- Automatically determines the file type and format
- Automatically defines and registers schemas
- Stores all metadata in centralized registry
Click here to read more about ActiveScan.
Loom Datasets are Actionable
Loom Datasets give users a consistent and simple view of their Hadoop data
Loom provides a layer of abstraction above the Hadoop file system. Files and groups of files discovered and parsed by ActiveScan are exposed to users in the form of Loom Datasets. Loom Datasets are at the core of Loom's metamodel which gives users a consistent and simple metamodel for managing and interacting with Hadoop data.
- Loom Datasets have known schemas
- Loom Datasets have known formats
- Loom Datasets are defined automatically by Activescan
- Loom Datasets are actionable - transformation and analysis can be done through Hive and Loom
Click here to read more about Loom Datasets.
Dynamic Lineage Graphs
Loom tracks lineage of all dataset transformations in Hadoop
The Loom Workbench provides a simple web interface for data scientists and data engineers to perform transformations over their Hadoop datasets. Loom tracks the lineage of all transformations to provide transparency and auditability of all dataset preparation in Hadoop. Loom is integrated with Hive, providing users with the full power of the Hive engine.
- Dataset munging - character replace, white-space removal, de-duplication, datatype transformation...
- SQL operations - joins, filters, aggregates, UDFs...
- Integration with third-party tools
- Loom tracks all lineage and metadata
Click here to read more about dataset transformations and lineage in Loom.
The Loom API provides a simple way to integrate Hadoop into an existing enterprise architecture
All Loom functionality is exposed through a simple RESTful API. Third-party applications can use this API to access and register metadata about datasets, transformations, and lineage. The API can also be used for direct access to data for use in analysis or in Big Data applications.
- Publish metadata and data through Loom API
- Provide simple access point into Hadoop for existing tools
- Out-of-the-box integration with R
Click here to read more about the Loom API.
|Hadoop Distributions||Cloudera (CDH3, CDH4), HortonWorks, MapR, Intel, Apache|
|OS||Latest versions of Ubuntu/CentOS|
|Analytics||SQL, HiveQL, R, REST API|
|Browsers||Firefox x.x, Chrome x.x|