반응형
블로그 이미지
개발자로서 현장에서 일하면서 새로 접하는 기술들이나 알게된 정보 등을 정리하기 위한 블로그입니다. 운 좋게 미국에서 큰 회사들의 프로젝트에서 컬설턴트로 일하고 있어서 새로운 기술들을 접할 기회가 많이 있습니다. 미국의 IT 프로젝트에서 사용되는 툴들에 대해 많은 분들과 정보를 공유하고 싶습니다.
솔웅

최근에 올라온 글

최근에 달린 댓글

최근에 받은 트랙백

글 보관함

카테고리

'XGBoost'에 해당되는 글 1

  1. 2019.05.15 AWS SageMaker - xgboost : Create Files and and save it to S3


반응형

1. Imported Libraries

pandas : https://en.wikipedia.org/wiki/Pandas_(software)

 

pandas (software) - Wikipedia

Python programming library for data manipulation and analysis In computer programming, pandas is a software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for

en.wikipedia.org

In computer programming, pandas is a software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series. It is free software released under the three-clause BSD license.[2] The name is derived from the term "panel data", an econometrics term for data sets that include observations over multiple time periods for the same individuals

 

https://pandas.pydata.org/

 

Python Data Analysis Library — pandas: Python Data Analysis Library

Python Data Analysis Library pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. pandas is a NumFOCUS sponsored project. This will help ensure t

pandas.pydata.org

 

numpy : https://en.wikipedia.org/wiki/NumPy

 

NumPy - Wikipedia

From Wikipedia, the free encyclopedia Jump to navigation Jump to search Numerical programming library for the Python programming language NumPy (pronounced (NUM-py) or sometimes [2][3] (NUM-pee)) is a library for the Python programming language, adding sup

en.wikipedia.org

NumPy (pronounced /ˈnʌmp/ (NUM-py) or sometimes /ˈnʌmpi/[2][3] (NUM-pee)) is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. The ancestor of NumPy, Numeric, was originally created by Jim Hugunin with contributions from several other developers. In 2005, Travis Oliphant created NumPy by incorporating features of the competing Numarray into Numeric, with extensive modifications. NumPy is open-source software and has many contributors.

 

https://www.numpy.org/

 

NumPy — NumPy

NumPy NumPy is the fundamental package for scientific computing with Python. It contains among other things: a powerful N-dimensional array object sophisticated (broadcasting) functions tools for integrating C/C++ and Fortran code useful linear algebra, Fo

www.numpy.org

NumPy is the fundamental package for scientific computing with Python. It contains among other things:

  • a powerful N-dimensional array object
  • sophisticated (broadcasting) functions
  • tools for integrating C/C++ and Fortran code
  • useful linear algebra, Fourier transform, and random number capabilities

Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data. Arbitrary data-types can be defined. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases.

NumPy is licensed under the BSD license, enabling reuse with few restrictions.

 

boto3 : Interacting for S3   https://boto3.amazonaws.com/v1/documentation/api/latest/index.html

 

Boto 3 Documentation — Boto 3 Docs 1.9.148 documentation

 

boto3.amazonaws.com

Boto is the Amazon Web Services (AWS) SDK for Python. It enables Python developers to create, configure, and manage AWS services, such as EC2 and S3. Boto provides an easy to use, object-oriented API, as well as low-level access to AWS services.

 

sagemaker.amazon.common

 

2. https://docs.scipy.org/doc/numpy/reference/generated/numpy.random.seed.html

 

numpy.random.seed — NumPy v1.16 Manual

Parameters: seed : int or 1-d array_like, optional Seed for RandomState. Must be convertible to 32 bit unsigned integers.

docs.scipy.org

numpy.random.seed(seed=None)

Seed the generator.

This method is called when RandomState is initialized. It can be called again to re-seed the generator. For details, see RandomState.

Parameters:

seed : int or 1-d array_like, optional

Seed for RandomState. Must be convertible to 32 bit unsigned integers.

See also

RandomState

 

3. numpy.random.random_sample(size=None) : https://docs.scipy.org/doc/numpy/reference/generated/numpy.random.random_sample.html?highlight=random%20random_sample#numpy.random.random_sample

 

numpy.random.random_sample — NumPy v1.16 Manual

Parameters: size : int or tuple of ints, optional Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. Default is None, in which case a single value is returned.

docs.scipy.org

numpy.random.randint : https://docs.scipy.org/doc/numpy/reference/generated/numpy.random.randint.html

 

numpy.random.randint — NumPy v1.16 Manual

Parameters: low : int Lowest (signed) integer to be drawn from the distribution (unless high=None, in which case this parameter is one above the highest such integer). high : int, optional If provided, one above the largest (signed) integer to be drawn fro

docs.scipy.org

4. 

5. df : https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html

 

pandas.DataFrame — pandas 0.24.2 documentation

Parameters: data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame Dict can contain Series, arrays, constants, or list-like objects Changed in version 0.23.0: If data is a dict, argument order is maintained for Python 3.6 and later. index

pandas.pydata.org

 

 

 

6. df - Print values

7. 파일로 저장 pandas.DataFrame.to_csv : https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html

 

pandas.DataFrame.to_csv — pandas 0.24.2 documentation

Parameters: path_or_buf : str or file handle, default None File path or object, if None is provided the result is returned as a string. If a file object is passed it should be opened with newline=’‘, disabling universal newlines. Changed in version 0.24.0:

pandas.pydata.org

8. 함수 : 3개의 파라미터를 받음 - 파일을 S3에 저장하는 함수

9. 함수 : boto3를 사용해서 해당 파일을 S3 버킷으로부터 다운 받음

boto3.Session().resource('s3') : https://boto3.amazonaws.com/v1/documentation/api/latest/guide/session.html

 

Session — Boto 3 Docs 1.9.148 documentation

Session A session manages state about a particular configuration. By default a session is created for you when needed. However it is possible and recommended to maintain your own session(s) in some scenarios. Sessions typically store: Credentials Region Ot

boto3.amazonaws.com

10. 8번 함수를 실행시켜 해당 파일을 S3에 저장

upload_fileobj : https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html?highlight=upload_fileobj#S3.Bucket.upload_fileobj

 

S3 — Boto 3 Docs 1.9.148 documentation

The response of this operation contains an EventStream member. When iterated the EventStream will yield events based on the structure below, where only one of the top level keys will be present for any given event. Response Syntax { 'Payload': EventStream(

boto3.amazonaws.com

11. 9번 함수를 실행시켜 해당 파일을 S3로부터 다운 받음

Bucket object download_fileobj : https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html?highlight=download_fileobj#S3.Bucket.download_fileobj

 

S3 — Boto 3 Docs 1.9.148 documentation

The response of this operation contains an EventStream member. When iterated the EventStream will yield events based on the structure below, where only one of the top level keys will be present for any given event. Response Syntax { 'Payload': EventStream(

boto3.amazonaws.com

12. 

13. 처음 시작 5개 데이터를 출력함

df.head() : https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.head.html

 

pandas.DataFrame.head — pandas 0.24.2 documentation

Return the first n rows. This function returns the first n rows for the object based on position. It is useful for quickly testing if your object has the right type of data in it. Parameters: n : int, default 5 Number of rows to select. Returns: obj_head :

pandas.pydata.org

14. 해당 컬럼들을 매트릭스에 담음 ??? 

pandas.DataFrame.as_matrix : https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.as_matrix.html?highlight=as_matrix#pandas.DataFrame.as_matrix

 

pandas.DataFrame.as_matrix — pandas 0.24.2 documentation

Parameters: columns : list, optional, default:None If None, return all columns, otherwise, returns specified columns.

pandas.pydata.org

15. X 값들

16. ???

17. y 컬럼을 매트릭스에 담음

 

18. y 값 형태. 10줄에 1개 컬럼

19. y 값

20. y 값을 한줄에 표시함

numpy.ravel : https://docs.scipy.org/doc/numpy/reference/generated/numpy.ravel.html

 

numpy.ravel — NumPy v1.16 Manual

Parameters: a : array_like Input array. The elements in a are read in the order specified by order, and packed as a 1-D array. order : {‘C’,’F’, ‘A’, ‘K’}, optional The elements of a are read using this index order. ‘C’ means to index the elements in row-m

docs.scipy.org

21. y 값

23. 함수 : 전달받은 파일을 ????

write_numpy_to_dense_tensor : https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/amazon/common.py

 

aws/sagemaker-python-sdk

A library for training and deploying machine learning models on Amazon SageMaker - aws/sagemaker-python-sdk

github.com

read_records

24. 함수 : 해당 파일을 ?????

25. write_recordio_file 함수를 실행 함

 

 

26. 첫 3 줄만 출력

27. read_recordio_file 함수 실행

32. 해당 파일을 S3에 저장함

33. 해당 파일을 S3에서 다운 받음

 

반응형
이전 1 다음