# Pandas DataFrame 快速教程
本教程介绍了 DataFrames，这是 pandas API 中的核心数据结构。这不是一个全面的 DataFrames 教程。相反，这个实验室提供了一个非常快速的介绍，涵盖了在机器学习速成课程中进行其他练习所需的 DataFrames 的部分内容。
DataFrame 类似于内存中的电子表格。与电子表格类似：

- DataFrame 将数据存储在单元格中。
- DataFrame 有命名列（通常）和编号的行。

## 导入 NumPy和pandas

运行以下代码单元以导入 NumPy 和 pandas 模块。
```@CodeBlock
styleID: 0
codeblock:
  projectId: ''
  title: ''
  name: ''
  language: python_wasm
  code: |-
    import numpy as np
    import pandas as pd
  outputMode: autoParacraft
  output: ''

```
## 创建 DataFrame
以下代码单元创建一个简单的 DataFrame，包含 10 个单元格，组织如下：

* 5 行
* 2 列，一列名为 temperature，另一列名为 activity
  
以下代码单元实例化了一个 pd.DataFrame 类来生成一个 DataFrame。该类接受两个参数：

- 第一个参数提供数据以填充这 10 个单元格。代码单元调用 np.array 来生成 5x2 的 NumPy 数组。
- 第二个参数标识两列的名称。
  
note：请勿在以下代码单元中重新定义变量。后续的代码单元会使用这些变量。


```@CodeBlock
styleID: 0
codeblock:
  projectId: ''
  title: ''
  name: ''
  language: python_wasm
  code: |-
    # Create and populate a 5x2 NumPy array.
    my_data = np.array([[0, 3], [10, 7], [20, 9], [30, 14], [40, 15]])

    # Create a Python list that holds the names of the two columns.
    my_column_names = ['temperature', 'activity']

    # Create a DataFrame.
    my_dataframe = pd.DataFrame(data=my_data, columns=my_column_names)

    # Print the entire DataFrame
    print(my_dataframe)
  outputMode: autoParacraft
  output: |2
       temperature  activity
    0            0         3
    1           10         7
    2           20         9
    3           30        14
    4           40        15

```
## 向 DataFrame 添加新列
你可以通过为新列名称赋值来向现有的 pandas DataFrame 中添加新列。例如，以下代码创建了一个名为 adjusted 的第三列，加入到 my_dataframe 中：


```@CodeBlock
styleID: 0
codeblock:
  projectId: ''
  title: ''
  name: ''
  language: python_wasm
  code: |-
    # Create a new column named adjusted.
    my_dataframe["adjusted"] = my_dataframe["activity"] + 2

    # Print the entire DataFrame
    print(my_dataframe)
  outputMode: autoParacraft
  output: |2
       temperature  activity  adjusted
    0            0         3         5
    1           10         7         9
    2           20         9        11
    3           30        14        16
    4           40        15        17

```
## 指定 DataFrame 的子集
Pandas 提供了多种方法来隔离 DataFrame 中的特定行、列、切片或单元格。
 
```@CodeBlock
styleID: 0
codeblock:
  projectId: ''
  title: ''
  name: ''
  language: python_wasm
  code: |-
    print("Rows #0, #1, and #2:")
    print(my_dataframe.head(3), '\n')

    print("Row #2:")
    print(my_dataframe.iloc[[2]], '\n')

    print("Rows #1, #2, and #3:")
    print(my_dataframe[1:4], '\n')

    print("Column 'temperature':")
    print(my_dataframe['temperature'])
  outputMode: autoParacraft
  output: |
    Rows #0, #1, and #2:
       temperature  activity  adjusted
    0            0         3         5
    1           10         7         9
    2           20         9        11 

    Row #2:
       temperature  activity  adjusted
    2           20         9        11 

    Rows #1, #2, and #3:
       temperature  activity  adjusted
    1           10         7         9
    2           20         9        11
    3           30        14        16 

    Column 'temperature':
    0     0
    1    10
    2    20
    3    30
    4    40
    Name: temperature, dtype: int32

```
## Task 1: Create a DataFrame

*   列表项
*   列表项


Do the following:

  1. Create an 3x4 (3 rows x 4 columns) pandas DataFrame in which the columns are named `Eleanor`,  `Chidi`, `Tahani`, and `Jason`.  Populate each of the 12 cells in the DataFrame with a random integer between 0 and 100, inclusive.

  2. Output the following:

     * the entire DataFrame
     * the value in the cell of row #1 of the `Eleanor` column

  3. Create a fifth column named `Janet`, which is populated with the row-by-row sums of `Tahani` and `Jason`.

To complete this task, it helps to know the NumPy basics covered in the NumPy UltraQuick Tutorial. 

```@CodeBlock
styleID: 0
codeblock:
  projectId: ''
  title: ''
  name: ''
  language: python_wasm
  code: '# Write your code here.'
  outputMode: autoParacraft
  output: ''

```
Double-click for a solution to Task 1.
```@CodeBlock
styleID: 0
codeblock:
  projectId: ''
  title: ''
  name: ''
  language: python_wasm
  code: |-
    #@title Double-click for a solution to Task 1.

    # Create a Python list that holds the names of the four columns.
    my_column_names = ['Eleanor', 'Chidi', 'Tahani', 'Jason']

    # Create a 3x4 numpy array, each cell populated with a random integer.
    my_data = np.random.randint(low=0, high=101, size=(3, 4))

    # Create a DataFrame.
    df = pd.DataFrame(data=my_data, columns=my_column_names)

    # Print the entire DataFrame
    print(df)

    # Print the value in row #1 of the Eleanor column.
    print("\nSecond row of the Eleanor column: %d\n" % df['Eleanor'][1])

    # Create a column named Janet whose contents are the sum
    # of two other columns.
    df['Janet'] = df['Tahani'] + df['Jason']

    # Print the enhanced DataFrame
    print(df)
  outputMode: autoParacraft
  output: ''

```
## Copying a DataFrame (optional)

Pandas provides two different ways to duplicate a DataFrame:

* **Referencing.** If you assign a DataFrame to a new variable, any change to the DataFrame or to the new variable will be reflected in the other. 
* **Copying.** If you call the `pd.DataFrame.copy` method, you create a true independent copy.  Changes to the original DataFrame or to the copy will not be reflected in the other. 

The difference is subtle, but important.

```@CodeBlock
styleID: 0
codeblock:
  projectId: ''
  title: ''
  name: ''
  language: jupyter_server_container
  code: >-
    # Create a reference by assigning my_dataframe to a new variable.

    print("Experiment with a reference:")

    reference_to_df = df


    # Print the starting value of a particular cell.

    print("  Starting value of df: %d" % df['Jason'][1])

    print("  Starting value of reference_to_df: %d\n" %
    reference_to_df['Jason'][1])


    # Modify a cell in df.

    df.at[1, 'Jason'] = df['Jason'][1] + 5

    print("  Updated df: %d" % df['Jason'][1])

    print("  Updated reference_to_df: %d\n\n" % reference_to_df['Jason'][1])


    # Create a true copy of my_dataframe

    print("Experiment with a true copy:")

    copy_of_my_dataframe = my_dataframe.copy()


    # Print the starting value of a particular cell.

    print("  Starting value of my_dataframe: %d" % my_dataframe['activity'][1])

    print("  Starting value of copy_of_my_dataframe: %d\n" %
    copy_of_my_dataframe['activity'][1])


    # Modify a cell in df.

    my_dataframe.at[1, 'activity'] = my_dataframe['activity'][1] + 3

    print("  Updated my_dataframe: %d" % my_dataframe['activity'][1])

    print("  copy_of_my_dataframe does not get updated: %d" %
    copy_of_my_dataframe['activity'][1])
  outputMode: autoParacraft
  output: ''

```

<div class="markdown-google-sans">

## 数据科学
</div>

借助平台，您可以充分利用常用 Python 库的强大功能来分析和可视化数据。下方的代码单元格使用 <strong>NumPy</strong> 生成一些随机数据，并使用 <strong>Matplotlib</strong> 可视化这些数据。要修改代码，只需点击单元格，然后开始修改。

```@CodeBlock
styleID: 0
codeblock:
  projectId: ''
  title: ''
  name: ''
  language: python_wasm
  code: |-
    import numpy as np
    from matplotlib import pyplot as plt

    ys = 200 + np.random.randn(100)
    x = [x for x in range(len(ys))]

    plt.plot(x, ys, '-')
    plt.fill_between(x, ys, 195, where=(ys > 195), facecolor='g', alpha=0.6)

    plt.title("Sample Visualization")
    plt.show()
  outputMode: autoParacraft
  output: ''
  output_image: ''

```