Amazon S3 云存储服务Cloud Storage编程实践

2011-01-04 18:48

Amazon Simple Storage Service (S3) 是一个云端存储平台，这是现在蓬勃发展的云计算的典型应用之一。用户可以将自己的数据上传到云端服务器，便可以随时随地地访问到这些数据，灵活高效。它按需收费，也就是说使用相应容量的存储空间，就花相应的钱。这里有具体的资费标准。对于企业用户来说，使用这项服务实际上可以大大降低成本，这些成本不仅仅包括自己购置服务器硬件、软件成本，还包括电力、为IT设施维护而雇佣的人力成本等等。

在Amazon S3中有如下几个概念，通过分别介绍，我们可以大致理解云存储的基本原理。

Buckets:一个bucket是一个用于存储的容器，我们可以不太恰当地理解为就是云端的文件夹。文件夹要求一个独特唯一的名字，这和注册邮箱名差不多，可以加前缀或者后缀来避免重名。bucket使得我们在一个高层级上组织命名空间，并在数据的访问控制上扮演重要角色。下面举个例子，假设一个名为photos/puppy.jpg的文件对象存储在名为johnsmith的bucket里，那么我们就可以通过这样一个url访问到这个对象：http://johnsmith.s3.amazonaws.com/photos/puppy.jpg

Objects:对象，也就是存储在S3里的基本实体。一个object包括object data和metadata。metadata是一系列的name-value对，用来描述这个object。默认情况下包括文件类型、最后修改时间等等，当然用户也可以自定义一些metadata。

Keys:即bucket中每一个object的独一无二的标识符。上面例子中提到的photos/puppy.jpg就是一个key。

Access Control Lists:访问控制表ACL。在S3中每一个bucket和object都有一个ACL，并且bucket和object的ACL是互相独立的。当用户发起一个访问请求，S3会检查ACL来核实请求发送者是否有权限访问这个bucket或object。

Regions:我们可以指定bucket的具体物理存储区域（Region）。选择适当的区域可以优化延迟、降低成本。Amazon在世界各地建立了数据中心，目前S3支持下列区域：US Standard，US (Northern California)，EU (Ireland)，APAC (Singapore)。

云端为了提高数据可靠性，常用手段是在多个不同的服务器建立同一份数据的冗余备份（replica）。这样即使某一个服务器挂了，用户仍然能够从别的服务器取得他的数据。使用多份数据副本将带来数据一致性的问题，如何保证每一份副本的内容是一致的？如何保证多个用户可以并发读写？这在分布式系统设计中是一个经典的问题，我将另写文章讨论。Amazon的US Standard Region为所有的requests提供了最终一致性(eventual consistency)，EU 和 Northern California Regions 提供了写后读一致性(read-after-write consistency)。

回到应用层面上来。希望开通试用S3云存储服务的同学，可以去看看这篇帖子，有详细步骤和截图。虽然Amazon给用户提供了十分友好的Web界面控制台来管理云端数据和应用，作为开发人员，我们也可以使用boto提供的API建立与Amazon云计算存储平台S3交互。boto是一个Amazon云计算服务的python接口，当然也有其他语言比如C++的接口libAWS，Java接口，Ruby接口，PHP接口，等等。这些API不仅仅用于S3，也可以用于EC2等其他云计算服务的调用。下面是一个示例程序，拥有连接Amazon S3上传下载文件等基本功能。

#!/usr/bin/python
#
#  Amazon S3 Interface
#  Author: Zeng, Xi
#  SID:    1010105140
#  Email:  [email protected]
connected = 0
 
def connect():
    access_key = raw_input('Your access key:').strip()
    secret_key = raw_input('Your secret key:').strip()
    from boto.s3.connection import S3Connection
    global conn
    conn = S3Connection(access_key, secret_key)
    global connected
    connected = 1
 
def creat():
    if connected == 0:
        print 'Not connected!'
    elif connected == 1:
        bucket_name = raw_input('Bucket name:').strip()
        bucket = conn.create_bucket(bucket_name)
 
def put():
    if connected == 0:
        print 'Not connected!'
    elif connected == 1:
        local_file = raw_input('Local filename:').strip()
        bucket = raw_input('Target bucket name:').strip()
        from boto.s3.key import Key
        b = conn.get_bucket(bucket)
        k = Key(b)
        k.key = local_file
        k.set_contents_from_filename(local_file)
 
def ls():
    if connected == 0:
        print 'Not connected!'
    elif connected == 1:
        rs = conn.get_all_buckets()
        for b in rs:
            print b.name
 
def lsfile():
    if connected == 0:
        print 'Not connected!'
    elif connected == 1:
        bucket = raw_input('Bucket name:').strip()
        from boto.s3.key import Key
        b = conn.get_bucket(bucket)
        file_list = b.list()
        for l in file_list:
            print l.name
 
def info():
    if connected == 0:
        print 'Not connected!'
    elif connected == 1:
        bucket = raw_input('Bucket name:').strip()
        filename = raw_input('Filename:').strip()
        from boto.s3.bucketlistresultset import BucketListResultSet
        b = conn.get_bucket(bucket)
        brs = BucketListResultSet(bucket=b)
        for f in brs:
            key = b.lookup(f.name)
            print 'File: ' + f.name
            print 'size: ' + str(key.size)
            print 'last modified: ' + str(key.last_modified)
            print 'etag (md5): ' + str(key.etag)
 
def permission():
    if connected == 0:
        print 'Not connected!'
    elif connected == 1:
        while True:
            bucket = raw_input('Bucket name:').strip()
            permission = raw_input('Permission (private or public-read):').strip()
            if permission not in ['private', 'public-read']:
                print 'Input error!'
            elif permission in ['private', 'public-read']:
                break
        b = conn.get_bucket(bucket)
        b.set_acl(permission)
 
def get():
    if connected == 0:
        print 'Not connected!'
    elif connected == 1:
        bucket = raw_input('Source bucket name:').strip()
        s_file = raw_input('Source filename:').strip()
        d_file = raw_input('Local directory path and filename:').strip()
        from boto.s3.key import Key
        b = conn.get_bucket(bucket)
        key = b.lookup(s_file)
        key.get_contents_to_filename(d_file)
 
def delete():
    if connected == 0:
        print 'Not connected!'
    elif connected == 1:
        bucket = raw_input('Bucket name:').strip()
        conn.delete_bucket(bucket)
 
def delfile():
    if connected == 0:
        print 'Not connected!'
    elif connected == 1:
        bucket = raw_input('Bucket name:').strip()
        filename = raw_input('Filename:').strip()
        b = conn.get_bucket(bucket)
        b.delete_key(filename)
 
def showMenu():
    title = '''
        Amazon S3 Service
 
    connect        Get user credential and connect to Amazon S3
    creat        Creat bucket
    put        Upload file to S3
    ls        List buckets
    lsfile        List files in a bucket
    info        Display information of a file
    permission    Set bucket permissions
    get        Download file from S3
    delete        Delete bucket
    delfile        Delete file
    quit        Quit
 
Enter choice:'''
    while True:
        choice = raw_input(title).strip().lower()
        choices =  ['connect','creat','put','ls','lsfile','info','permission','get','delete','delfile','quit']
        if choice not in choices:
            print('Input Error!')
        else:
            if choice == 'quit':
                break
            elif choice == 'connect':
                connect()
            elif choice == 'creat':
                creat()
            elif choice == 'put':
                put()
            elif choice == 'ls':
                ls()
            elif choice == 'lsfile':
                lsfile()
            elif choice == 'info':
                info()
            elif choice == 'permission':
                permission()
            elif choice == 'get':
                get()
            elif choice == 'delete':
                delete()
            elif choice == 'delfile':
                delfile()
if __name__ == '__main__':
    showMenu()

#!/usr/bin/python # # Amazon S3 Interface # Author: Zeng, Xi # SID: 1010105140 # Email: [email protected] connected = 0 def connect(): access_key = raw_input('Your access key:').strip() secret_key = raw_input('Your secret key:').strip() from boto.s3.connection import S3Connection global conn conn = S3Connection(access_key, secret_key) global connected connected = 1 def creat(): if connected == 0: print 'Not connected!' elif connected == 1: bucket_name = raw_input('Bucket name:').strip() bucket = conn.create_bucket(bucket_name) def put(): if connected == 0: print 'Not connected!' elif connected == 1: local_file = raw_input('Local filename:').strip() bucket = raw_input('Target bucket name:').strip() from boto.s3.key import Key b = conn.get_bucket(bucket) k = Key(b) k.key = local_file k.set_contents_from_filename(local_file) def ls(): if connected == 0: print 'Not connected!' elif connected == 1: rs = conn.get_all_buckets() for b in rs: print b.name def lsfile(): if connected == 0: print 'Not connected!' elif connected == 1: bucket = raw_input('Bucket name:').strip() from boto.s3.key import Key b = conn.get_bucket(bucket) file_list = b.list() for l in file_list: print l.name def info(): if connected == 0: print 'Not connected!' elif connected == 1: bucket = raw_input('Bucket name:').strip() filename = raw_input('Filename:').strip() from boto.s3.bucketlistresultset import BucketListResultSet b = conn.get_bucket(bucket) brs = BucketListResultSet(bucket=b) for f in brs: key = b.lookup(f.name) print 'File: ' + f.name print 'size: ' + str(key.size) print 'last modified: ' + str(key.last_modified) print 'etag (md5): ' + str(key.etag) def permission(): if connected == 0: print 'Not connected!' elif connected == 1: while True: bucket = raw_input('Bucket name:').strip() permission = raw_input('Permission (private or public-read):').strip() if permission not in ['private', 'public-read']: print 'Input error!' elif permission in ['private', 'public-read']: break b = conn.get_bucket(bucket) b.set_acl(permission) def get(): if connected == 0: print 'Not connected!' elif connected == 1: bucket = raw_input('Source bucket name:').strip() s_file = raw_input('Source filename:').strip() d_file = raw_input('Local directory path and filename:').strip() from boto.s3.key import Key b = conn.get_bucket(bucket) key = b.lookup(s_file) key.get_contents_to_filename(d_file) def delete(): if connected == 0: print 'Not connected!' elif connected == 1: bucket = raw_input('Bucket name:').strip() conn.delete_bucket(bucket) def delfile(): if connected == 0: print 'Not connected!' elif connected == 1: bucket = raw_input('Bucket name:').strip() filename = raw_input('Filename:').strip() b = conn.get_bucket(bucket) b.delete_key(filename) def showMenu(): title = ''' Amazon S3 Service connect Get user credential and connect to Amazon S3 creat Creat bucket put Upload file to S3 ls List buckets lsfile List files in a bucket info Display information of a file permission Set bucket permissions get Download file from S3 delete Delete bucket delfile Delete file quit Quit Enter choice:''' while True: choice = raw_input(title).strip().lower() choices = ['connect','creat','put','ls','lsfile','info','permission','get','delete','delfile','quit'] if choice not in choices: print('Input Error!') else: if choice == 'quit': break elif choice == 'connect': connect() elif choice == 'creat': creat() elif choice == 'put': put() elif choice == 'ls': ls() elif choice == 'lsfile': lsfile() elif choice == 'info': info() elif choice == 'permission': permission() elif choice == 'get': get() elif choice == 'delete': delete() elif choice == 'delfile': delfile() if __name__ == '__main__': showMenu()

对于个人用户来说，文件同步是一个很实用的功能。如果我们的电脑被窃或硬盘损坏，我们仍可以通过同步文件夹从云端获取以前的文件。云存储也带来了移动便利，在一些紧急场合，我们甚至可以使用手机来编辑文档。事实上已经有很多这方面的应用，国外的同步工具Dropbox十分流行，它其实就是以Amazon S3为存储后台的。国内115网盘之类应用也是层出不穷，金山发布了快盘、T盘，迅雷又宣布发布P盘……

下面的python代码就是使用boto API写的一个同步文件夹的示例程序。程序通过检查文件名、大小、MD5来判断云端的文件和本地文件夹中的是否相同。如果不同，则下载到本地文件夹。

#!/usr/bin/python
#
#  Synchronize files between local machine and the cloud storage.
#  Author: Zeng, Xi
#  SID:    1010105140
#  Email:  [email protected]
 
connected = 0
downloaded_files = ""
total_size = 0
 
def connect():
    access_key = raw_input('Your access key:').strip()
    secret_key = raw_input('Your secret key:').strip()
    from boto.s3.connection import S3Connection
    global conn
    conn = S3Connection(access_key, secret_key)
    global connected
    connected = 1
 
def sync():
    if connected == 0:
        print 'Not connected!\n'
        connect()
    if connected == 1:
        bucket = raw_input('Bucket name:').strip()
        local_path = raw_input('Local directory path:').strip()
        from boto.s3.key import Key
        from hashlib import md5
        b = conn.get_bucket(bucket)
        file_list = b.list()
        for l in file_list:
            try:
                F = open(local_path + l.name,"rb")
            except IOError, e:
                get(bucket, l.name, local_path, l.size)
            else:
                s = md5(F.read()).hexdigest()
                if "\""+str(s)+"\"" == str(l.etag):
                    import os
                    local_size = os.path.getsize(local_path + l.name)
                    if int(local_size) == int(l.size):
                        continue
                    else:
                        get(bucket, l.name, local_path, l.size)
                else:
                    get(bucket, l.name, local_path, l.size)
    global downloaded_files
    global total_size
    print "Downloaded files:\n"
    print downloaded_files
    print "Total size:"
    print total_size
 
def get(bucket, filename, local_path, size):
    global downloaded_files
    global total_size
    downloaded_files += filename + "\n"
    total_size += size
    from boto.s3.key import Key
    b = conn.get_bucket(bucket)
    key = b.lookup(filename)
    key.get_contents_to_filename(local_path + filename)
 
if __name__ == '__main__':
    sync()

#!/usr/bin/python # # Synchronize files between local machine and the cloud storage. # Author: Zeng, Xi # SID: 1010105140 # Email: [email protected] connected = 0 downloaded_files = "" total_size = 0 def connect(): access_key = raw_input('Your access key:').strip() secret_key = raw_input('Your secret key:').strip() from boto.s3.connection import S3Connection global conn conn = S3Connection(access_key, secret_key) global connected connected = 1 def sync(): if connected == 0: print 'Not connected!\n' connect() if connected == 1: bucket = raw_input('Bucket name:').strip() local_path = raw_input('Local directory path:').strip() from boto.s3.key import Key from hashlib import md5 b = conn.get_bucket(bucket) file_list = b.list() for l in file_list: try: F = open(local_path + l.name,"rb") except IOError, e: get(bucket, l.name, local_path, l.size) else: s = md5(F.read()).hexdigest() if "\""+str(s)+"\"" == str(l.etag): import os local_size = os.path.getsize(local_path + l.name) if int(local_size) == int(l.size): continue else: get(bucket, l.name, local_path, l.size) else: get(bucket, l.name, local_path, l.size) global downloaded_files global total_size print "Downloaded files:\n" print downloaded_files print "Total size:" print total_size def get(bucket, filename, local_path, size): global downloaded_files global total_size downloaded_files += filename + "\n" total_size += size from boto.s3.key import Key b = conn.get_bucket(bucket) key = b.lookup(filename) key.get_contents_to_filename(local_path + filename) if __name__ == '__main__': sync()

下载以上程序源代码：S3接口、同步工具。
运行前请确认你已经安装了python和boto

Amazon的云计算不仅仅是S3数据存储，还包括EC2虚拟机，SimpleDB数据库等等很多服务。如果你有兴趣，可以查看下面的相关文章。

关于作者：我目前是一名在读研究生，如果你觉得我的文章对你有用，或我了解的知识对贵公司项目开发有帮助，或许你会有兴趣与我联系。

Amazon S3 云存储服务Cloud Storage编程实践

没有引用

微信订阅

访问统计

赞助商链接

Amazon S3 云存储服务Cloud Storage编程实践

没有引用

微信订阅

用户登录

访问统计

赞助商链接