来自 https://github.com/guipsamora/pandas_exercises
This time we are going to pull data directly from the internet.
Special thanks to: https://github.com/justmarkham for sharing the dataset and materials.
Step 1. Import the necessary libraries
import pandas as pd
import numpy as np
address.
Step 3. Assign it to a variable called chipo.
url = 'https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv'
chipo = pd.read_csv(url,sep='\t')
Step 4. See the first 10 entries
# Solution 1
chipo[:10]
|
order_id |
quantity |
item_name |
choice_description |
item_price |
| 0 |
1 |
1 |
Chips and Fresh Tomato Salsa |
NaN |
$2.39 |
| 1 |
1 |
1 |
Izze |
[Clementine] |
$3.39 |
| 2 |
1 |
1 |
Nantucket Nectar |
[Apple] |
$3.39 |
| 3 |
1 |
1 |
Chips and Tomatillo-Green Chili Salsa |
NaN |
$2.39 |
| 4 |
2 |
2 |
Chicken Bowl |
[Tomatillo-Red Chili Salsa (Hot), [Black Beans... |
$16.98 |
| 5 |
3 |
1 |
Chicken Bowl |
[Fresh Tomato Salsa (Mild), [Rice, Cheese, Sou... |
$10.98 |
| 6 |
3 |
1 |
Side of Chips |
NaN |
$1.69 |
| 7 |
4 |
1 |
Steak Burrito |
[Tomatillo Red Chili Salsa, [Fajita Vegetables... |
$11.75 |
| 8 |
4 |
1 |
Steak Soft Tacos |
[Tomatillo Green Chili Salsa, [Pinto Beans, Ch... |
$9.25 |
| 9 |
5 |
1 |
Steak Burrito |
[Fresh Tomato Salsa, [Rice, Black Beans, Pinto... |
$9.25 |
# Solution 2
chipo.head(10)
|
order_id |
quantity |
item_name |
choice_description |
item_price |
| 0 |
1 |
1 |
Chips and Fresh Tomato Salsa |
NaN |
$2.39 |
| 1 |
1 |
1 |
Izze |
[Clementine] |
$3.39 |
| 2 |
1 |
1 |
Nantucket Nectar |
[Apple] |
$3.39 |
| 3 |
1 |
1 |
Chips and Tomatillo-Green Chili Salsa |
NaN |
$2.39 |
| 4 |
2 |
2 |
Chicken Bowl |
[Tomatillo-Red Chili Salsa (Hot), [Black Beans... |
$16.98 |
| 5 |
3 |
1 |
Chicken Bowl |
[Fresh Tomato Salsa (Mild), [Rice, Cheese, Sou... |
$10.98 |
| 6 |
3 |
1 |
Side of Chips |
NaN |
$1.69 |
| 7 |
4 |
1 |
Steak Burrito |
[Tomatillo Red Chili Salsa, [Fajita Vegetables... |
$11.75 |
| 8 |
4 |
1 |
Steak Soft Tacos |
[Tomatillo Green Chili Salsa, [Pinto Beans, Ch... |
$9.25 |
| 9 |
5 |
1 |
Steak Burrito |
[Fresh Tomato Salsa, [Rice, Black Beans, Pinto... |
$9.25 |
Step 5. What is the number of observations in the dataset?
type(chipo)
pandas.core.frame.DataFrame
# Solution 1
len(chipo.index)
4622
# Solution 2
chipo.shape[0]
4622
# Solution 3
chipo.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4622 entries, 0 to 4621
Data columns (total 5 columns):
order_id 4622 non-null int64
quantity 4622 non-null int64
item_name 4622 non-null object
choice_description 3376 non-null object
item_price 4622 non-null object
dtypes: int64(2), object(3)
memory usage: 180.7+ KB
Step 6. What is the number of columns in the dataset?
# Solution 1
len(chipo.columns)
5
# Solution 2
chipo.shape[1]
5
Step 7. Print the name of all the columns.
list(chipo.columns)
['order_id', 'quantity', 'item_name', 'choice_description', 'item_price']
Step 8. How is the dataset indexed?
chipo.index
RangeIndex(start=0, stop=4622, step=1)
Step 9. Which was the most-ordered item?
c = chipo.groupby('item_name')
c = c.sum()
c = c.sort_values(['quantity'],ascending=False)
c['quantity'].head(1)
item_name
Chicken Bowl 761
Name: quantity, dtype: int64
Step 10. For the most-ordered item, how many items were ordered?
c = chipo.groupby('item_name')
c = c.sum()
c = c.sort_values(['quantity'],ascending=False)
c['quantity'].head(1)
item_name
Chicken Bowl 761
Name: quantity, dtype: int64
Step 11. What was the most ordered item in the choice_description column?
c = chipo.groupby('choice_description')
c = c.sum()
c = c.sort_values(['quantity'],ascending=False)
c.head(1)
|
order_id |
quantity |
| choice_description |
|
|
| [Diet Coke] |
123455 |
159 |
Step 12. How many items were orderd in total?
chipo['quantity'].sum()
4972
Step 13. Turn the item price into a float
Step 13.a. Check the item price type
chipo['item_price'].dtypes
dtype('O')
Step 13.b. Create a lambda function and change the type of item price
chipo['item_price'] = chipo['item_price'].apply(lambda x:x.replace('$','')).astype(np.float64);
# dollarizer = lambda x:float(x[1:-1])
# chipo.item_price = chipo.item_price.apply(dollarizer)
Step 13.c. Check the item price type
chipo['item_price'].dtypes
dtype('float64')
Step 14. How much was the revenue for the period in the dataset?
(chipo['quantity']*chipo['item_price']).sum()
39237.02
Step 15. How many orders were made in the period?
# Solution 1
g = chipo.groupby(['order_id'])
g.ngroups
1834
# Solution 2
orders = chipo.order_id.value_counts().count()
orders
1834
Step 16. What is the average revenue amount per order?
# Solution 1
chipo['revenue'] = chipo['quantity']*chipo['item_price']
order_grouped = chipo.groupby(by=['order_id']).sum()
order_grouped.mean()['revenue']
21.394231188658654
# Solution 2
chipo.groupby(by=['order_id']).sum().mean()['revenue']
21.394231188658654
Step 17. How many different items are sold?
chipo.item_name.value_counts().count()
50