• Aucun résultat trouvé

Key databases and variables

1.4 Data and preliminary estimations

1.4.1 Key databases and variables

1 if loanˆ >0

0 if loanˆ 0 (1.2)

loan =W γ+ν (1.3)

After estimating the probit model in equation 1.3, the fitted value, Lambda, can be derived from equation 1.4, where ˆγ is a vector of estimated coefficients and Φ is the cumulative distribution function of the standard normal distribution.

λ= Φ(Wγ)ˆ (1.4)

By including the variableLambdain equation1.1, equation1.5can ease selection bias.

However, endogeneity bias remains a problem, requiring the application of two-stage least squares methods and the introduction of instrumental variables to solve this issue.

RPT=βBFR+σλ+u (1.5)

1.3.2 Two-stage least squares (2SLS)

In the conventional setup, 2SLS requires finding instrumental variables Z for the key explanatory variables, i.e., the bank-firm relationship proxies (BF R) in equation 1.6.

BFR=Zδ+ε (1.6)

By estimating equation 1.6, the fitted value of BF R is instrumented to remove endo-geneity bias. Furthermore, the instrumented BFR variables can be replaced in equation 1.5 to become equation 1.7, which is the main equation to be estimated in the rest of the chapter. In particular, the dependent variable, RP T, can replace other explained variables while keeping the previous steps unchanged.

RPT=βBFRˆ +σλ+u (1.7)

The rest of the chapter will start by introducing the estimation of the detailed probit model in equation 1.3 and its required data. Based on the estimation result, the variable Lambda will be generated. Next, estimation of the detailed first stage of equation1.6will describe the related data and variables at the same time. Finally, by applying the fitted BFR variables and the correction variableLambda, the result sections will investigate the actual RPT-BFR relationship and the channels through which the relationship operates.

1.4 Data and preliminary estimations

1.4.1 Key databases and variables

Commercial loan contract terms of Chinese listed firms

To conduct this research, datasets from several different sources are merged to construct the different stages of analysis. The central data for all the analyses in this chapter are the loan contract data, and the rest of the dataset is illustrated along with each regression model.

A firm begins to build a relationship with a bank when a commercial loan contract is agreed upon. Thus, to study how a bank can impact firm-level activities, all analyses are based on loan deals, and having more details on such contracts allows for more in-depth investigation. Unfortunately, the public financial databases that the author checked do not provide this information, nor does any previous literature declare the usage of commercial loan datasets for Chinese listed firms.

Qian and Yeung,2015aaccess a private and detailed sample of loans from one Chinese national bank whose borrowers are mainly private firms. However, to detect tunneling activities, audited financial statements and internal transactions are required, and private firms do not provide such information publicly.

Bailey et al., 2011atries to search all bank loan-related announcements from the web-site of the China Securities Regulatory Commission (CSRC). Although the China Stock Market & Accounting Research (CSMAR) Database, from a Shenzhen-based data vendor (GTA), has built this public announcement database, it is missing an enormous number of observations. First, disclosure regulations require listed firms to announce only impor-tant events that may significantly affect their stock prices. Listed firms, therefore, only selectively announce loans above a certain value. Second, corporate governance policies in some firms allow the CEO or chair to sign borrowing contracts without shareholder approval as long as the amount is below a given threshold. Third, most loan-related public announcements state the overall amount of loans or credit lines after the signing of an agreement with a bank, but the number in the announcement is not the annually realized value of the commercial loan.

In the absence of a Chinese corporate loan database counterpart to the Loan Pricing Corporation’s (LPC’s) DealScan database for US firms, the author designed an algorithm to extract loan information from firms’ annual financial reports. Chinese listed firms are encouraged but not required to disclose their top short- and long-term borrowings by size.

Such information is not limited to bank loans but also includes sources from trusts and even from individuals. Thus, most listed firms report the top 5 or 10 or even all commercial loans in the additional notes of their annual financial statements. Based on the annual report PDFs, the algorithm extracts bank-loan observations, including the exact lending bank’s name (not the top-level name of a commercial bank but the exact branch), the loan amount, the tenor, the interest rates or the rules for calculating interest, and the currency of the loan. This information covers the essential terms of a loan contract. Some firms even provide further details, such as collaterals used for the loan contract or any guarantors of the loan. This task is not easy at all since firms’ annual reports differ and have no consistent format, which causes tremendous difficulty in collecting the data.

Although this collected dataset does not cover all commercial loans from all listed firms and these variables are not always available for each loan contract in all annual reports, the loan dataset offers a good representation of the Chinese commercial loan market. To the best of our knowledge, this is the first work to collect and study a dataset on Chinese commercial loan contracts.

Overall, the algorithm collects over 100,000 pieces of borrowing information. However, since this chapter only studies new loans, the algorithm further identifies a new loan contract as a zero loan amount at the beginning of the financial year and a nonzero amount at the end of the year. Borrowing information with a nonzero starting amount only reports repayment results instead of the initiation of a new loan contract. After this filter is implemented, this sample contains approximately 30,000 new loan contracts agreed from 2001 to 2015. Comparing the actual cash borrowing events with the extracted

borrowing events, defined as at least one loan contract is captured from the same firm’s financial report in the same year, this dataset covers 96.4% of borrowing decisions. Since listed firms only disclose their top loan contracts, this dataset captures 83.4% borrowing amount, compared to the “new borrowing” aggregate item in financial reports. Overall speaking, this lending contract dataset is relatively representative.

Bank branch profile

In addition to the specific bank branch details in the loan contract information, the analysis also requires bank information for all branch levels, such as a bank branch’s name, its level, and its address, to identify its geographic location, bank-firm relationship, and other bank-related measurements.

Since the establishment of any bank branch requires a banking certification from the China Banking Regulatory Commission (CBRC), the full Chinese bank branch data are scraped from the CBRC website. This dataset covers all Chinese financial institutions, including not only commercial banks but also rural credit cooperatives, policy banks, and financial firms. The variables include a bank branch’s name, address, foundation date, branch level within the bank hierarchy, bank category, and bank city code. The address of each bank branch can be used to locate its latitude and longitude and calculate the distance to another set of coordinates. Matching the bank names with the lender name in each loan contract via an algorithm completes the linkage between a firm and a bank branch through a loan contract.

Firm financial characteristics

Firm-level financial control variables were obtained from the database of RESSET, a Bei-jing data vendor specialized in financial data. Multiple regression models in the following analysis commonly use a few firm-level control variables defined in the same way, but the calculated periods are different. Size is the natural logarithm of the book value of the total assets in the same year. Leverage is the ratio of long-term debt over total as-sets in the same calendar year. Cash is the total cash in proportion to the lagged total assets. Capital expenditure is the capital expenditure in cash-flow statements divided by lagged total assets. ROA (return on assets) is income divided by lagged total assets.

The market-to-book ratio is the stock price at the end of the year multiplied by total shares and divided by total book equity value. The construction of the Kaplan-Zingales (KZ) index follows the methodology of Baker et al., 2003 to measure the likelihood that the firm faces financial constraints. The firm age is calculated as the number of years since the firm was founded. The SOE dummy (1 for SOE and 0 for non-SOE) repre-sents state-owned enterprises, which account for a significant proportion of Chinese listed firms. Controlling for these firm characteristics helps to isolate the direct impact of the bank-firm relationship on firm activities that is not due to other factors, e.g., asymmetric information related to size, leverage, or profitability (Murray Z. Frank and Vidhan K.

Goyal, 2003a) or extra investment opportunities from cash and capital expenditure. In addition, year and industry dummies are used to further capture fixed effects. The year fixed effects cover from 2001 to 2015, where the latest loan contract was collected. Since the calculation of both dependent and independent variables at the firm level takes the average value across the loan contract duration and there exist loan contracts lasting after 2015, the study also collects financial statements until 2019 and only considers the loan contract terminated before the end of 2019. Firm industry information is obtained

from RESSET’s basic firm information table. The definition of the industry fixed effects follows the second level of the Global Industry Classification Standard (GICS) categories, ranging from 1010 to 5510 total 22 levels. Finally, the sample of interest only includes nonfinancial listed firms within business groups.