linecache – Random access to text lines

The linecache module allows one to get any line from a Python source file, while attempting to optimize internally, using a cache, the common case where many lines are read from a single file.

官方介紹文件：11.9. linecache — Random access to text lines

linecache 這個模組旨在讓其他 Python library 能夠高效率的取得其他模組或套件的原始碼，在整個標準函式庫中被 import 了 21 次，主要是被 traceback 以及 pdb 這種需要顯示原始碼的模組使用。透過 linecache 的 API，會將被讀取的程式碼置於記憶體中 (一個名為 cache 的 dict 當中)，然後可以透過 getline 來取的相對應的程式碼。

01. Quickstart Tutorial

假設我們想要取得 os 模組的第三行到第五行，我們可以這樣使用：

>>> import linecache
>>> for i in range(3, 6):
…     print(linecache.getline(‘os.py’, i)
…
This exports:

  – all functions from posix or nt, e.g. unlink, stat, etc.

  – os.path is either posixpath or ntpath

>>>

>>> import linecache

>>> for i in range(3, 6):

... print(linecache.getline(‘os.py’, i)

...

This exports:

– all functions from posix or nt, e.g. unlink, stat, etc.

– os.path is either posixpath or ntpath

>>>

或是透過 os.__file__ 取得：

>>> import os
>>> for i in range(3, 6):
…     print(linecache.getline(os.__file__, i))
…
This exports:

  – all functions from posix or nt, e.g. unlink, stat, etc.

  – os.path is either posixpath or ntpath

>>>

>>> import os

>>> for i in range(3, 6):

... print(linecache.getline(os.__file__, i))

...

This exports:

– all functions from posix or nt, e.g. unlink, stat, etc.

– os.path is either posixpath or ntpath

>>>

我們可以發現，透過 getline 取得的回傳值，最後面都會被加上一個換行符號，就算是空白行也一樣：

>>> linecache.getline(‘os.py’, 1)
‘r”””OS routines for NT or Posix depending on what system we\’re on.\n’
>>> linecache.getline(‘os.py’, 2)
‘\n’
>>> linecache.getline(‘os.py’, 3)
‘This exports:\n’
>>>

>>> linecache.getline(‘os.py’, 1)

‘r”””OS routines for NT or Posix depending on what system we\’re on.\n’

>>> linecache.getline(‘os.py’, 2)

‘\n’

>>> linecache.getline(‘os.py’, 3)

‘This exports:\n’

>>>

如果遇到錯誤呢？例如明明只有 100 行卻要求回傳 500 行，或是要求一個不存在的檔案？那這樣就會回傳空字串：

>>> import linecache
>>> linecache.getline(‘os.py’, 100)
‘    def _add(str, fn):\n’
>>> linecache.getline(‘os.py’, 10000)    # File exist, but line not exists
”
>>> linecache.getline(‘__os__.py’, 100)  # File not exists
”
>>>

>>> import linecache

>>> linecache.getline(‘os.py’, 100)

‘ def _add(str, fn):\n’

>>> linecache.getline(‘os.py’, 10000) # File exist, but line not exists

”

>>> linecache.getline(‘__os__.py’, 100) # File not exists

”

>>>

02. HOW-TO Guides

linecache 如何選擇哪個 `os.py` 來解析？

前面的範例裡面，我們只使用 os.py ，並沒有給完整的路徑，但為什麼 linecache 還能找到呢？原因是因為 linecache 會嘗試透過 sys.path 來找尋模組，因此最後找到相對應的 os.py 後，會放入 cache 之中再回傳該行的結果。

linecache.getline 的 `module_globals` 參數的用途？

module_globals 用來處理以 PEP 302 import 的模組，例如 zipfile import 的模組就會需要使用到 module_globals ，要不然會拿不到 source lines：

$ mkdir zipper
$ touch zipper/__init__.py
$ echo foobar = 10 >> zipper/__init__.py
$ zip -r zipper zipper
$ ls
zipper zipper.zip
$ mv zipper _zipper

$ mkdir zipper

$ touch zipper/__init__.py

$ echo foobar = 10 >> zipper/__init__.py

$ zip –r zipper zipper

$ ls

zipper zipper.zip

$ mv zipper _zipper

>>> import sys
>>> import linecache
>>> sys.path.insert(0, ‘zipper.zip’)
>>> import zipper
>>> zipper.foobar
10
>>> linecache.getline(zipper.__file__, 1)  # Can’t retrieve line
”
>>> linecache.getline(zipper.__file__, 1, zipper.__dict__)
‘foobar = 10\n’
>>> linecache.getline(‘zipper’, 1, zipper.__dict__)
‘foobar = 10\n’
>>>

>>> import sys

>>> import linecache

>>> sys.path.insert(0, ‘zipper.zip’)

>>> import zipper

>>> zipper.foobar

>>> linecache.getline(zipper.__file__, 1) # Can’t retrieve line

”

>>> linecache.getline(zipper.__file__, 1, zipper.__dict__)

‘foobar = 10\n’

>>> linecache.getline(‘zipper’, 1, zipper.__dict__)

‘foobar = 10\n’

>>>

請參考 PEP 273、PEP 302 以及實際應用的 doctest 原始碼。

清除 linecache 的 cache？

兩種方式，一種透過 API，一種直接清除：

>>> import linecache
>>> linecache.clearcache()
>>> linecache.cache = {}

>>> import linecache

>>> linecache.clearcache()

>>> linecache.cache = {}

兩種方法等價，看原始碼就能發現這個事實：

# The cache

# The cache. Maps filenames to either a thunk which will provide source code,
# or a tuple (size, mtime, lines, fullname) once loaded.
cache = {}


def clearcache():
    “””Clear the cache entirely.”””

    global cache
    cache = {}

# The cache

# The cache. Maps filenames to either a thunk which will provide source code,

# or a tuple (size, mtime, lines, fullname) once loaded.

cache = {}

def clearcache():

“””Clear the cache entirely.”””

global cache

cache = {}

03. Discussions

實際參看 linecache 會發現實作非常的精巧，透過 __all__ 公開的 API 僅有三個: getline、clearcache 以及 checkcache。整體程式碼的 cache 就被放置在名為 cache 的 dictionary 之中，以 tuple (size, mtime, lines, fullname) 的方式存放。針對檔案 encoding 的部分，使用 tokenize.open() 來解決，tokenize.open 會使用 tokenize.detect_encoding() 來處理 encoding 的部分。

最主要的核心功能是 linecache.getlines，程式碼如下：

def getlines(filename, module_globals=None):
    “””Get the lines for a Python source file from the cache.
    Update the cache if it doesn’t contain an entry for this file already.”””

    if filename in cache:
        entry = cache[filename]
        if len(entry) != 1:
            return cache[filename][2]

    try:
        return updatecache(filename, module_globals)
    except MemoryError:
        clearcache()
        return []

def getlines(filename, module_globals=None):

“””Get the lines for a Python source file from the cache.

Update the cache if it doesn’t contain an entry for this file already.”””

if filename in cache:

entry = cache[filename]

if len(entry) != 1:

return cache[filename][2]

try:

return updatecache(filename, module_globals)

except MemoryError:

clearcache()

return []

開頭檢查 cache 是否有要尋找的檔案，如果有就回傳，後面透過 updatecache try except 來處理 MemoryError 的問題。後面便是處理特殊 import 以及 lazy cache 的部分。

有興趣請參考完整程式碼：Lib/linecache.py

04. References

PEP 273 – Import Modules from Zip Archives
PEP 302 – New Import Hooks
linecache – Read Text File Efficiciently – MOTW3

linecache — 你所不知道的 Python 標準函式庫用法 05

linecache – Random access to text lines

01. Quickstart Tutorial

02. HOW-TO Guides

linecache 如何選擇哪個 `os.py` 來解析？

linecache.getline 的 `module_globals` 參數的用途？

清除 linecache 的 cache？

03. Discussions

04. References

Comments

Leave a Reply Cancel reply

linecache — 你所不知道的 Python 標準函式庫用法 05

linecache – Random access to text lines

01. Quickstart Tutorial

02. HOW-TO Guides

linecache 如何選擇哪個 os.py 來解析？

linecache.getline 的 module_globals 參數的用途？

清除 linecache 的 cache？

03. Discussions

04. References

Comments

Leave a Reply Cancel reply

linecache 如何選擇哪個 `os.py` 來解析？

linecache.getline 的 `module_globals` 參數的用途？