Raw Crawl Test App

Crawls and Exports Data to S3

Build Custom Application

Did you know, you can build custom applications like "Raw Crawl Test App" using Datoin Platform ?

Build your Own Application

Created By Datoin Admin
Modules 2
Rating 0.0/10 (0 votes)
Updated On Jul 20, 2015

Component Sequence (These components are executed in same order as depicted in number on the component card)


This module schedules a nutch based crawl with the help of oozie. When the crawl data arrives in specified queue this module imports it to the pipeline. This module takes various resources: crawl settings, seed file, oozie settings and extraction settings for outlinks discovery

Created By Datoin Admin
Used In 0 Applications
Version 1.0.0-SNAPSHOT
Updated On Jul 17, 2015


This Module consumes document stream and uploads to any configured S3 Bucket. It takes AWS S3's bucketId, accessKey, secretKey and the fileKey

Created By Datoin Admin
Used In 2 Applications
Version 1.0.1-SNAPSHOT
Updated On Feb 22, 2017

© 2017 Datoin · All Rights Reserved. No part of this website may be reproduced without Datoin's expressed consent. Terms & Services