Api

Api is the whole program entrance which connects items, cache, storage, handles the request from user and fetches html from source sites. For example:

from toapi import XPath, Item, Api

api = Api(base_url='https://news.ycombinator.com')


class Post(Item):
    url = XPath('//a[@class="storylink"]/@href')
    title = XPath('//a[@class="storylink"]/text()')

    class Meta:
        source = XPath('//tr[@class="athing"]')
        route = '/'


api.register(Post)
api.serve()

Arguments

base_url

The argument base_url is hostname of source web site. default = None

settings

The argument settings is the global configuration of the whole app. default = None means use default settings.


Methods

.register(self, item)

Register an item so that we could parse it.

.serve(self, ip='127.0.0.1', port=5000, **options)

Start to serve.

.parse(self, path, params=None, **kwargs)

Parse items if the path is defined in registered items.

.fetch_page_source(self, url, item, params=None, **kwargs)

Fetch html from an url.

.get_browser(self, settings, item_with_ajax=False)

Init a PhantomJS instance to the Api instance.

.update_status(self, key)

Update status of Api instance.

.get_status(self, key)

Get status of Api instance.

.set_cache(self, key, value)

Set cache. In Api instance, the value usually in type of dict.

.get_cache(self, key)

Get cache.

.set_storage(self, key, value)

Set storage.In Api instance, the value is usually a HTML.

.get_storage(self, key)

Get storage.

.parse_item(self, html, item)

Parse items from HTML.