Google+
What is your requirement? Request For Quote

Benefits of using Datoin Platform for Crawling

Crawling or Data Acquisition is just an another component in the complete extraction pipeline. Datoin platform gives us the benefit of a quick configuration of extractions, and easier implementation of custom business logic using already existing off-the-shelf components. Thus, we can build a quicker proof of concepts and deliver faster than any of custom solutions. And, did we forgot to say it is scalable too. Datoin platform is built on top of Apache Hadoop and customised Nutch crawler.

Extraction and Analysis

Transformation

Transform different schema from various websites into one standardised schema which you can readily consume in your app.

Filtering

Weed out undesired records, unwanted categories and useless records with incomplete information.

De-duplication

Get only unique records by removing the redundant records.

Customised Extraction

Site specific or site agnostics scraping tool, which can be customised to your data needs.

Predefined Extraction

NLP and Machine Learning to extract verticals such as restaurant, products, news, etc helps you get started.

Classification

Classify your extracted data into predefined category by using our trained machine learning components for Eg: News Category, Sentiment etc.

Clustering

Carry out cluster analysis on crawled data to identify the pattern and use them in your application.

Aggregation

Get great insights about the data using our customizable aggregation components.

Add-On Analysis

Easily add custom analysis components such as enrichment, different data sources(other than web), etc to your crawl pipeline.

Crawler Hosting

Datoin Cloud Hosting

Forget about maintenance, scalability issues, etc. Use Datoin hosted crawler to get data and focus on the core problem.

On-Premise or Private Cloud

The undocking feature that comes with Datoin Platform helps us to deploy on your private cloud.

Control

Focused Crawling

Our crawler learns the path to pages which have needed info to optimise your crawling.

Continuous Crawling

Get fresh data continuously such as news, product prices, etc.

Scheduled Crawling

You may want to schedule your crawl to check for new data periodically.

JavaScript Ready

Use JavaScript engine to tackle the website which uses ajax and JavaScript to enhance user experience.

Export Data

Data Formats

Get the data in whatever the format you may need such as XML, JSON, Excel, CSV, etc

APIS

Convert your crawler into REST APIs, integrate data directly into your applications!

Web hooks

Export data into cloud storages(Amazon s3), to hosted search indices(Solr, Elastic), to databases(Mongo, Mysql) and many more.

Customised Extraction

Tailor made extraction suit your needs. Please send us your requirement and our team will work with you to extract the data you need.

Interested in how can Datoin be useful to you?

Request a Demo »

© 2017 Datoin · All Rights Reserved. No part of this website may be reproduced without Datoin's expressed consent. Terms & Services