盒子
盒子
文章目录
  1. I、安装mysql
  2. II、新建数据库
  3. III、scrapy插入数据

scrapy 数据库插入

I、安装mysql

  • 安装mysql

    1
    sudo apt-get install mysql
  • 安装python-mysql

    1
    sudo apt-get install python-mysqldb
  • 安装python支持mysql的驱动

    1
    sudo pip install pymysql

Tips:安装时密码不要为空

II、新建数据库

  • 以root身份进入mysql

    1
    mysql -u root
  • 新建数据库(数据库名 db)

    1
    create database db
  • 分配权限

    1
    GRANT ALL PRIVILEGES ON db.\* TO star@localhost IDENTIFIED BY "123456";

    Tips:
    db 是刚建的数据库
    star 是新建的数据库用户
    12345 是密码

  • 新建table

    1
    use db
  • 设置编码

    1
    alter database mydb character set utf8
  • 检查数据库编码是否设置成功

    1
    show variables like 'character_set_%';

然后自己去新建table(这里就不多说了)

III、scrapy插入数据

  • setting.py添加如下代码
    1
    ITEM_PIPELINES = ['wooyun.pipelines.WooyunPipeline']

Tips:
wooyun 是scrapy项目名
WooyunPipeline 是Pipeline 名

  • pipeline.py
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    # Define your item pipelines here
    #
    # Don't forget to add your pipeline to the ITEM_PIPELINES setting
    # See: http://doc.scrapy.org/topics/item-pipeline.html
    #
    from scrapy import log
    from twisted.enterprise import adbapi
    from scrapy.http import Request
    from scrapy.exceptions import DropItem
    from scrapy.contrib.pipeline.images import ImagesPipeline
    import time
    import MySQLdb
    import MySQLdb.cursors
    import socket
    import select
    import sys
    import os
    import errno
    class WooyunPipeline(object):
    def __init__(self):
    self.dbpool = adbapi.ConnectionPool('MySQLdb', db='db',
    user='star', passwd='cc', cursorclass=MySQLdb.cursors.DictCursor,
    charset='utf8', use_unicode=True)
    def process_item(self, item, spider):
    # run db query in thread pool
    query = self.dbpool.runInteraction(self._conditional_insert, item)
    query.addErrback(self.handle_error)
    return item
    def _conditional_insert(self, tx, item):
    # create record if doesn't exist.
    # all this block run on it's own thread
    tx.execute("select \* from wooyun where name = %s", (item['name'][0]))
    result = tx.fetchone()
    if result:
    log.msg("Item already stored in db: %s" % item, level=log.DEBUG)
    else:
    tx.execute(\
    "insert into wooyun (name,time,url) "
    "values (%s, %s,%s)",
    (item['name'][0],
    item['time'][0],
    item['url'][0])
    )
    log.msg("Item stored in db: %s" % item, level=log.DEBUG)
    def handle_error(self, e):
    log.err(e)
支持一下
走过的,路过的,请支持一下我 n(*≧▽≦*)n