Google+

Raw Crawl Test App

Crawls and Exports Data to S3

Build Custom Application

Did you know, you can build custom applications like "Raw Crawl Test App" using Datoin Platform ?

Build your Own Application

Created By Datoin Admin
Modules 2
Rating 0.0/10 (0 votes)
Updated On Jul 20, 2015

Component Sequence (These components are executed in same order as depicted in number on the component card)

1

This module schedules a nutch based crawl with the help of oozie. When the crawl data arrives in specified queue this module imports it to the pipeline. This module takes various resources: crawl settings, seed file, oozie settings and extraction settings for outlinks discovery

Created By Datoin Admin
Used In 0 Applications
Version 1.0.0-SNAPSHOT
Updated On Jul 17, 2015

2

This Module consumes document stream and uploads to any configured S3 Bucket. It takes AWS S3's bucketId, accessKey, secretKey and the fileKey

Created By Datoin Admin
Used In 2 Applications
Version 1.0.1-SNAPSHOT
Updated On Feb 22, 2017

© 2017 Datoin · All Rights Reserved. No part of this website may be reproduced without Datoin's expressed consent. Terms & Services