Web服务器性能/压力测试工具http_load、webbench、ab、Siege使用教程

1:39:00 PM 0 Comments


http_load

程序非常小,解压后也不到100K

http_load以并行复用的方式运行,用以测试web服务器的吞吐量与负载。但是它不同于大多数压力测试工具,它可以以一个单一的进程运行,一般不会把客户机搞死。还可以测试HTTPS类的网站请求。

下载地址:http://soft.vpser.net/test/http_load/http_load-12mar2006.tar.gz

安装很简单

#tar zxvf http_load-12mar2006.tar.gz
#cd http_load-12mar2006
#make && make install
命令格式:http_load  -p 并发访问进程数  -s 访问时间  需要访问的URL文件

参数其实可以自由组合,参数之间的选择并没有什么限制。比如你写成http_load -parallel 5 -seconds

300 urls.txt也是可以的。我们把参数给大家简单说明一下。
-parallel 简写-p :含义是并发的用户进程数。
-fetches 简写-f :含义是总计的访问次数
-rate    简写-p :含义是每秒的访问频率
-seconds简写-s :含义是总计的访问时间
准备URL文件:urllist.txt,文件格式是每行一个URL,URL最好超过50-100个测试效果比较好.文件格式

如下:

http://www.linuxde.net/uncategorized/choose-vps.html

http://www.linuxde.net/vps-cp/hypervm-tutorial.html


http://www.linuxde.net/coupons/diavps-april-coupons.html


http://www.linuxde.net/security/vps-backup-web-mysql.html

例如:

http_load -p 30 -s 60  urllist.txt
参数了解了,我们来看运行一条命令来看看它的返回结果

命令:% ./http_load -rate 5 -seconds 10 urls说明执行了一个持续时间10秒的测试,每秒的频率为5。

49 fetches, 2 max parallel, 289884 bytes, in 10.0148 seconds5916 mean bytes/connection4.89274
fetches/sec, 28945.5 bytes/secmsecs/connect: 28.8932 mean, 44.243 max, 24.488 minmsecs/first
-response: 63.5362 mean, 81.624 max, 57.803 minHTTP response codes: code 200 — 49
结果分析:

49 fetches, 2 max parallel, 289884 bytes, in 10.0148 seconds说明在上面的测试中运行了49个请求,最大的并发进程数是2,总计传输的数据是289884bytes,运行的时间是10.0148秒
5916 mean bytes/connection说明每一连接平均传输的数据量289884/49=5916
4.89274 fetches/sec, 28945.5 bytes/sec说明每秒的响应请求为4.89274,每秒传递的数据为28945.5 bytes/sec
msecs/connect: 28.8932 mean, 44.243 max, 24.488 min说明每连接的平均响应时间是28.8932 msecs,最大的响应时间44.243 msecs,最小的响应时间24.488 msecs
msecs/first-response: 63.5362 mean, 81.624 max, 57.803 min
HTTP response codes: code 200 — 49     说明打开响应页面的类型,如果403的类型过多,那可能要注意是否系统遇到了瓶颈。
特殊说明:

测试结果中主要的指标是 fetches/sec、msecs/connect 这个选项,即服务器每秒能够响应的查询次数,

用这个指标来衡量性能。似乎比 apache的ab准确率要高一些,也更有说服力一些。Qpt-每秒响应用户数和response time,每连接响应用户时间。测试的结果主要也是看这两个值。当然仅有这两个指标并不能完成对性能的分析,我们还需要对服务器的cpu、men进行分析,才能得出结论

webbench

webbench是Linux下的一个网站压力测试工具,最多可以模拟3万个并发连接去测试网站的负载能力。下载地址可以到google搜,我这里给出一个

下载地址:http://soft.vpser.net/test/webbench/webbench-1.5.tar.gz

这个程序更小,解压后不到50K,呵呵

安装非常简单

#tar zxvf webbench-1.5.tar.gz
#cd webbench-1.5
#make && make install
会在当前目录生成webbench可执行文件,直接可以使用了

用法:

webbench -c 并发数 -t 运行测试时间 URL

如:

webbench -c 5000 -t 120 http://www.linuxde.net
ab

ab是apache自带的一款功能强大的测试工具,安装了apache一般就自带了,用法可以查看它的说明

$ ./ab
./ab: wrong number of arguments
Usage: ./ab [options] [http://]hostname[:port]/path
Options are:
-n requests Number of requests to perform
-c concurrency Number of multiple requests to make
-t timelimit Seconds to max. wait for responses
-p postfile File containing data to POST
-T content-type Content-type header for POSTing
-v verbosity How much troubleshooting info to print
-w Print out results in HTML tables
-i Use HEAD instead of GET
-x attributes String to insert as table attributes
-y attributes String to insert as tr attributes
-z attributes String to insert as td or th attributes
-C attribute Add cookie, eg. ‘Apache=1234. (repeatable)
-H attribute Add Arbitrary header line, eg. ‘Accept-Encoding: gzip’
Inserted after all normal header lines. (repeatable)
-A attribute Add Basic WWW Authentication, the attributes
are a colon separated username and password.
-P attribute Add Basic Proxy Authentication, the attributes
are a colon separated username and password.
-X proxy:port Proxyserver and port number to use
-V Print version number and exit
-k Use HTTP KeepAlive feature
-d Do not show percentiles served table.
-S Do not show confidence estimators and warnings.
-g filename Output collected data to gnuplot format file.
-e filename Output CSV file with percentages served
-h Display usage information (this message)
参数众多,一般我们用到的是-n 和-c

例如:

./ab -c 1000 -n 100 http://www.linuxde.net/index.php
这个表示同时处理1000个请求并运行100次index.PHP文件.

Siege

一款开源的压力测试工具,可以根据配置对一个WEB站点进行多用户的并发访问,记录每个用户所有请求过程的相应时间,并在一定数量的并发访问下重复进行。

官方:http://www.joedog.org/
Siege下载:http://soft.vpser.net/test/siege/siege-2.67.tar.gz
解压:

# tar -zxf siege-2.67.tar.gz
进入解压目录:

# cd siege-2.67/
安装:

#./configure ; make
#make install
使用

siege -c 200 -r 10 -f example.url
-c是并发量,-r是重复次数。 url文件就是一个文本,每行都是一个url,它会从里面随机访问的。

example.url内容:

http://www.licess.cn

http://www.linuxde.net


http://soft.vpser.net
结果说明

Lifting the server siege… done.
Transactions: 3419263 hits //完成419263次处理
Availability: 100.00 % //100.00 % 成功率
Elapsed time: 5999.69 secs //总共用时
Data transferred: 84273.91 MB //共数据传输84273.91 MB
Response time: 0.37 secs //相应用时1.65秒:显示网络连接的速度
Transaction rate: 569.91 trans/sec //均每秒完成 569.91 次处理:表示服务器后
Throughput: 14.05 MB/sec //平均每秒传送数据
Concurrency: 213.42 //实际最高并发数
Successful transactions: 2564081 //成功处理次数
Failed transactions: 11 //失败处理次数
Longest transaction: 29.04 //每次传输所花最长时间
Shortest transaction: 0.00 //每次传输所花最短时间

10 Linux Strip Command Examples (Reduce Executable/Binary File Size)

6:02:00 PM 0 Comments


Strip command is used mostly in situations where you want to produce a production quality object file which contains minimum required information so that it can be light weight. You can also use it if you don’t want your executable or object file to get reverse engineered.
In this article, we will understand the usage of this command through some practical examples.

The syntax of strip command is :
strip [options] objfile...

Examples

Before jumping on to the examples, here is the code behind the executable that we would be using in this article.
#include

// Declare a static global
static int i=10;
// Declare a non static global
int global = 20;

int inc_func()
{
    static int local = 0;
    // return static local value
    return (++local);
}

int main(void)
{
    int count = inc_func();
    // Print the sum
    printf( "\n [%d] \n",(count + global + i));

    return 0;
}
Please note that the nm command that we mentioned in our Reverse Engineering Tools in Linux, cannot be used on an executable that is stripped using strip command.

1. Strip the symbol table using -s option

The symbol table can be stripped from an object file using -s option of strip command.
Consider the following example :
$ readelf -s example

Symbol table '.dynsym' contains 4 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND printf@GLIBC_2.2.5 (2)
     2: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND __gmon_start__
     3: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __libc_start_main@GLIBC_2.2.5 (2)

Symbol table '.symtab' contains 69 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 0000000000400238     0 SECTION LOCAL  DEFAULT    1
     2: 0000000000400254     0 SECTION LOCAL  DEFAULT    2
     ..
   28: 000000000040046c     0 FUNC    LOCAL  DEFAULT   14 call_gmon_start
    29: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS crtstuff.c
    30: 0000000000600e18     0 OBJECT  LOCAL  DEFAULT   19 __CTOR_LIST__
    31: 0000000000600e28     0 OBJECT  LOCAL  DEFAULT   20 __DTOR_LIST__
    32: 0000000000600e38     0 OBJECT  LOCAL  DEFAULT   21 __JCR_LIST__
    33: 0000000000400490     0 FUNC    LOCAL  DEFAULT   14 __do_global_dtors_aux
    34: 0000000000601028     1 OBJECT  LOCAL  DEFAULT   26 completed.7382
    35: 0000000000601030     8 OBJECT  LOCAL  DEFAULT   26 dtor_idx.7384
    36: 0000000000400500     0 FUNC    LOCAL  DEFAULT   14 frame_dummy
    37: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS crtstuff.c
    38: 0000000000600e20     0 OBJECT  LOCAL  DEFAULT   19 __CTOR_END__
    39: 0000000000400750     0 OBJECT  LOCAL  DEFAULT   18 __FRAME_END__
    40: 0000000000600e38     0 OBJECT  LOCAL  DEFAULT   21 __JCR_END__
    41: 0000000000400630     0 FUNC    LOCAL  DEFAULT   14 __do_global_ctors_aux
    42: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS example.c
    43: 0000000000601020     4 OBJECT  LOCAL  DEFAULT   25 i
    44: 0000000000601038     4 OBJECT  LOCAL  DEFAULT   26 local.2047
    45: 0000000000600fe8     0 OBJECT  LOCAL  HIDDEN   24 _GLOBAL_OFFSET_TABLE_
    46: 0000000000600e14     0 NOTYPE  LOCAL  HIDDEN   19 __init_array_end
    47: 0000000000600e14     0 NOTYPE  LOCAL  HIDDEN   19 __init_array_start
    48: 0000000000600e40     0 OBJECT  LOCAL  HIDDEN   22 _DYNAMIC
    49: 0000000000601010     0 NOTYPE  WEAK   DEFAULT   25 data_start
    50: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND printf@@GLIBC_2.2.5
    51: 0000000000400590     2 FUNC    GLOBAL DEFAULT   14 __libc_csu_fini
    52: 0000000000400440     0 FUNC    GLOBAL DEFAULT   14 _start
    53: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND __gmon_start__
    54: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND _Jv_RegisterClasses
    55: 0000000000400668     0 FUNC    GLOBAL DEFAULT   15 _fini
    56: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __libc_start_main@@GLIBC_
    57: 0000000000601024     4 OBJECT  GLOBAL DEFAULT   25 global
    58: 0000000000400678     4 OBJECT  GLOBAL DEFAULT   16 _IO_stdin_used
    59: 0000000000601010     0 NOTYPE  GLOBAL DEFAULT   25 __data_start
    60: 0000000000601018     0 OBJECT  GLOBAL HIDDEN   25 __dso_handle
    61: 0000000000600e30     0 OBJECT  GLOBAL HIDDEN   20 __DTOR_END__
    62: 00000000004005a0   137 FUNC    GLOBAL DEFAULT   14 __libc_csu_init
    63: 0000000000601028     0 NOTYPE  GLOBAL DEFAULT  ABS __bss_start
    64: 0000000000601040     0 NOTYPE  GLOBAL DEFAULT  ABS _end
    65: 0000000000601028     0 NOTYPE  GLOBAL DEFAULT  ABS _edata
    66: 0000000000400524    27 FUNC    GLOBAL DEFAULT   14 inc_func
    67: 000000000040053f    67 FUNC    GLOBAL DEFAULT   14 main
    68: 00000000004003f0     0 FUNC    GLOBAL DEFAULT   12 _init
The above output indicates that the executable contains the following symbols initially. Now lets strip the symbol table using -s option and then again see the output :
$ strip -s example
$ readelf -s example

Symbol table '.dynsym' contains 4 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND printf@GLIBC_2.2.5 (2)
     2: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND __gmon_start__
     3: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __libc_start_main@GLIBC_2.2.5 (2)
As can be seen clearly from the above output, the complete symbol table was stripped off.

2. Remove debug symbols only using –strip-debug option

Consider the following example :
$ strip --strip-debug example
Now lets check the symbol table (partial output shown below):
$ readelf -a example
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x400440
  Start of program headers:          64 (bytes into file)
  Start of section headers:          4464 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         9
  Size of section headers:           64 (bytes)
  Number of section headers:         31
  Section header string table index: 28

Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 5] .gnu.hash         GNU_HASH         00000000004002c0  000002c0
       000000000000001c  0000000000000000   A       6     0     8
  [29] .symtab           SYMTAB           0000000000000000  00001930
       0000000000000630  0000000000000018          30    46     8
  [30] .strtab           STRTAB           0000000000000000  00001f60
       00000000000001fd  0000000000000000           0     0     1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings)
  I (info), L (link order), G (group), x (unknown)
  O (extra OS processing required) o (OS specific), p (processor specific)

There are no section groups in this file.

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  PHDR           0x0000000000000040 0x0000000000400040 0x0000000000400040
                 0x00000000000001f8 0x00000000000001f8  R E    8
  ..
                0x0000000000000000 0x0000000000000000  RW     8
  GNU_RELRO      0x0000000000000e18 0x0000000000600e18 0x0000000000600e18
                 0x00000000000001e8 0x00000000000001e8  R      1

 Section to Segment mapping:
  Segment Sections...
   00
   01     .interp
   02     .interp .note.ABI-tag .note.gnu.build-id .hash .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt .init .plt .text .fini .rodata .eh_frame_hdr .eh_frame
   03     .ctors .dtors .jcr .dynamic .got .got.plt .data .bss
   04     .dynamic
   05     .note.ABI-tag .note.gnu.build-id
   06     .eh_frame_hdr
   07
   08     .ctors .dtors .jcr .dynamic .got 

Dynamic section at offset 0xe40 contains 21 entries:
  Tag        Type                         Name/Value
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
 ..
 0x0000000000000000 (NULL)               0x0

Relocation section '.rela.dyn' at offset 0x3a8 contains 1 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000600fe0  000200000006 R_X86_64_GLOB_DAT 0000000000000000 __gmon_start__ + 0

Relocation section '.rela.plt' at offset 0x3c0 contains 2 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000601000  000100000007 R_X86_64_JUMP_SLO 0000000000000000 printf + 0
000000601008  000300000007 R_X86_64_JUMP_SLO 0000000000000000 __libc_start_main + 0

There are no unwind sections in this file.

Symbol table '.dynsym' contains 4 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND printf@GLIBC_2.2.5 (2)
     2: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND __gmon_start__
     3: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __libc_start_main@GLIBC_2.2.5 (2)

Symbol table '.symtab' contains 66 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 000000000040046c     0 FUNC    LOCAL  DEFAULT   14 call_gmon_start
     2: 0000000000600e18     0 OBJECT  LOCAL  DEFAULT   19 __CTOR_LIST__
     ...
    61: 0000000000601040     0 NOTYPE  GLOBAL DEFAULT  ABS _end
    62: 0000000000601028     0 NOTYPE  GLOBAL DEFAULT  ABS _edata
    63: 0000000000400524    27 FUNC    GLOBAL DEFAULT   14 inc_func
    64: 000000000040053f    67 FUNC    GLOBAL DEFAULT   14 main
    65: 00000000004003f0     0 FUNC    GLOBAL DEFAULT   12 _init

Histogram for bucket list length (total of 3 buckets):
 Length  Number     % of total  Coverage
      0  0          (  0.0%)
      1  3          (100.0%)    100.0%

..

Notes at offset 0x00000254 with length 0x00000020:
  Owner  Data size Description
  GNU  0x00000010 NT_GNU_ABI_TAG (ABI version tag)

Notes at offset 0x00000274 with length 0x00000024:
  Owner  Data size Description
  GNU  0x00000014 NT_GNU_BUILD_ID (unique build ID bitstring)
Now if we compare the above output with the non stripped output of the same file then we see that the debug information highlighted in bold has been stripped off :
...
   36: 0000000000400500     0 FUNC    LOCAL  DEFAULT   14 frame_dummy
    37: 0000000000000000 0 FILE LOCAL DEFAULT ABS crtstuff.c
    38: 0000000000600e20     0 OBJECT  LOCAL  DEFAULT   19 __CTOR_END__
    39: 0000000000400750     0 OBJECT  LOCAL  DEFAULT   18 __FRAME_END__
    40: 0000000000600e38     0 OBJECT  LOCAL  DEFAULT   21 __JCR_END__
    41: 0000000000400630     0 FUNC    LOCAL  DEFAULT   14 __do_global_ctors_aux
 42: 0000000000000000 0 FILE LOCAL DEFAULT ABS example.c
    43: 0000000000601020     4 OBJECT  LOCAL  DEFAULT   25 i
    44: 0000000000601038     4 OBJECT  LOCAL  DEFAULT   26 local.2047
    45: 0000000000600fe8     0 OBJECT  LOCAL  HIDDEN   24 _GLOBAL_OFFSET_TABLE_
    46: 0000000000600e14     0 NOTYPE  LOCAL  HIDDEN   19 __init_array_end
...

3. Remove a particular section using -R option

If required, a complete section can be explicitly removed using the -R option.
Consider the following example :
Here, first we check all the section headers in non-stripped version of executable :
$ readelf -S example
There are 31 section headers, starting at offset 0x1170:

Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .interp           PROGBITS         0000000000400238  00000238
       000000000000001c  0000000000000000   A       0     0     1
  [ 2] .note.ABI-tag     NOTE             0000000000400254  00000254
       0000000000000020  0000000000000000   A       0     0     4
  [ 3] .note.gnu.build-i NOTE             0000000000400274  00000274
       0000000000000024  0000000000000000   A       0     0     4
  [ 4] .hash             HASH             0000000000400298  00000298
       0000000000000024  0000000000000004   A       6     0     8
  [ 5] .gnu.hash         GNU_HASH         00000000004002c0  000002c0
       000000000000001c  0000000000000000   A       6     0     8
  [ 6] .dynsym           DYNSYM           00000000004002e0  000002e0
       0000000000000060  0000000000000018   A       7     1     8
  [ 7] .dynstr           STRTAB           0000000000400340  00000340
       000000000000003f  0000000000000000   A       0     0     1
  [ 8] .gnu.version VERSYM 0000000000400380 00000380 0000000000000008 0000000000000002 A 6 0 2
  [ 9] .gnu.version_r    VERNEED          0000000000400388  00000388
       0000000000000020  0000000000000000   A       7     1     8
  [10] .rela.dyn         RELA             00000000004003a8  000003a8
       0000000000000018  0000000000000018   A       6     0     8
  [11] .rela.plt         RELA             00000000004003c0  000003c0
       0000000000000030  0000000000000018   A       6    13     8
  [12] .init             PROGBITS         00000000004003f0  000003f0
       0000000000000018  0000000000000000  AX       0     0     4
  [13] .plt              PROGBITS         0000000000400408  00000408
       0000000000000030  0000000000000010  AX       0     0     4
  [14] .text             PROGBITS         0000000000400440  00000440
       0000000000000228  0000000000000000  AX       0     0     16
  [15] .fini             PROGBITS         0000000000400668  00000668
       000000000000000e  0000000000000000  AX       0     0     4
  [16] .rodata           PROGBITS         0000000000400678  00000678
       000000000000000d  0000000000000000   A       0     0     4
  [17] .eh_frame_hdr     PROGBITS         0000000000400688  00000688
       000000000000002c  0000000000000000   A       0     0     4
  [18] .eh_frame         PROGBITS         00000000004006b8  000006b8
       000000000000009c  0000000000000000   A       0     0     8
  [19] .ctors            PROGBITS         0000000000600e18  00000e18
       0000000000000010  0000000000000000  WA       0     0     8
  [20] .dtors            PROGBITS         0000000000600e28  00000e28
       0000000000000010  0000000000000000  WA       0     0     8
  [21] .jcr              PROGBITS         0000000000600e38  00000e38
       0000000000000008  0000000000000000  WA       0     0     8
  [22] .dynamic          DYNAMIC          0000000000600e40  00000e40
       00000000000001a0  0000000000000010  WA       7     0     8
  [23] .got              PROGBITS         0000000000600fe0  00000fe0
       0000000000000008  0000000000000008  WA       0     0     8
  [24] .got.plt          PROGBITS         0000000000600fe8  00000fe8
       0000000000000028  0000000000000008  WA       0     0     8
  [25] .data             PROGBITS         0000000000601010  00001010
       0000000000000018  0000000000000000  WA       0     0     8
  [26] .bss              NOBITS           0000000000601028  00001028
       0000000000000018  0000000000000000  WA       0     0     8
  [27] .comment          PROGBITS         0000000000000000  00001028
       0000000000000048  0000000000000001  MS       0     0     1
  [28] .shstrtab         STRTAB           0000000000000000  00001070
       00000000000000fe  0000000000000000           0     0     1
  [29] .symtab           SYMTAB           0000000000000000  00001930
       0000000000000678  0000000000000018          30    49     8
  [30] .strtab           STRTAB           0000000000000000  00001fa8
       0000000000000212  0000000000000000           0     0     1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings)
  I (info), L (link order), G (group), x (unknown)
  O (extra OS processing required) o (OS specific), p (processor specific)
Now, lets strip the .gnu.version section from the executable :
strip -R .gnu.version example
Now, if we cross check the list of sections :
$ readelf -S example
There are 28 section headers, starting at offset 0x1158:

Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .interp           PROGBITS         0000000000400238  00000238
       000000000000001c  0000000000000000   A       0     0     1
  [ 2] .note.ABI-tag     NOTE             0000000000400254  00000254
       0000000000000020  0000000000000000   A       0     0     4
  [ 3] .note.gnu.build-i NOTE             0000000000400274  00000274
       0000000000000024  0000000000000000   A       0     0     4
  [ 4] .hash             HASH             0000000000400298  00000298
       0000000000000024  0000000000000004   A       6     0     8
  [ 5] .gnu.hash         GNU_HASH         00000000004002c0  000002c0
       000000000000001c  0000000000000000   A       6     0     8
  [ 6] .dynsym           DYNSYM           00000000004002e0  000002e0
       0000000000000060  0000000000000018   A       7     1     8
  [ 7] .dynstr           STRTAB           0000000000400340  00000340
       000000000000003f  0000000000000000   A       0     0     1
  [ 8] .gnu.version_r    VERNEED          0000000000400388  00000388
       0000000000000020  0000000000000000   A       7     1     8
  [ 9] .rela.dyn         RELA             00000000004003a8  000003a8
       0000000000000018  0000000000000018   A       6     0     8
  [10] .rela.plt         RELA             00000000004003c0  000003c0
       0000000000000030  0000000000000018   A       6    12     8
  [11] .init             PROGBITS         00000000004003f0  000003f0
       0000000000000018  0000000000000000  AX       0     0     4
  [12] .plt              PROGBITS         0000000000400408  00000408
       0000000000000030  0000000000000010  AX       0     0     4
  [13] .text             PROGBITS         0000000000400440  00000440
       0000000000000228  0000000000000000  AX       0     0     16
  [14] .fini             PROGBITS         0000000000400668  00000668
       000000000000000e  0000000000000000  AX       0     0     4
  [15] .rodata           PROGBITS         0000000000400678  00000678
       000000000000000d  0000000000000000   A       0     0     4
  [16] .eh_frame_hdr     PROGBITS         0000000000400688  00000688
       000000000000002c  0000000000000000   A       0     0     4
  [17] .eh_frame         PROGBITS         00000000004006b8  000006b8
       000000000000009c  0000000000000000   A       0     0     8
  [18] .ctors            PROGBITS         0000000000600e18  00000e18
       0000000000000010  0000000000000000  WA       0     0     8
  [19] .dtors            PROGBITS         0000000000600e28  00000e28
       0000000000000010  0000000000000000  WA       0     0     8
  [20] .jcr              PROGBITS         0000000000600e38  00000e38
       0000000000000008  0000000000000000  WA       0     0     8
  [21] .dynamic          DYNAMIC          0000000000600e40  00000e40
       00000000000001a0  0000000000000010  WA       7     0     8
  [22] .got              PROGBITS         0000000000600fe0  00000fe0
       0000000000000008  0000000000000008  WA       0     0     8
  [23] .got.plt          PROGBITS         0000000000600fe8  00000fe8
       0000000000000028  0000000000000008  WA       0     0     8
  [24] .data             PROGBITS         0000000000601010  00001010
       0000000000000018  0000000000000000  WA       0     0     8
  [25] .bss              NOBITS           0000000000601028  00001028
       0000000000000018  0000000000000000  WA       0     0     8
  [26] .comment          PROGBITS         0000000000000000  00001028
       0000000000000048  0000000000000001  MS       0     0     1
  [27] .shstrtab         STRTAB           0000000000000000  00001070
       00000000000000e1  0000000000000000           0     0     1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings)
  I (info), L (link order), G (group), x (unknown)
  O (extra OS processing required) o (OS specific), p (processor specific)
So we see that the .gnu.version section was stripped off.

4. Remove unneeded symbols using –strip-unneeded option

The unneeded symbols that are not required for relocation processing can be stripped off using the –strip-unneeded option.
Consider the following example :
$ strip --strip-unneeded example
The above command should have stripped the unneeded symbols from the executable.
Confirm this using the readelf command. In the output of readelf command, you’ll notice that unneeded information like .symtab section etc were stripped off.
$ readelf -a example

5. Shield a particular symbol from stripping using -K option

In a scenario where all the symbols are needed to be stripped off except one, then this can be achieved by supplying the symbol name along with the -K option.
Consider the example below :
$ strip -s -Kexample.c example
$ readelf -s example

Symbol table '.dynsym' contains 4 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND printf@GLIBC_2.2.5 (2)
     2: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND __gmon_start__
     3: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __libc_start_main@GLIBC_2.2.5 (2)

Symbol table '.symtab' contains 29 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 0000000000000000 0 FILE LOCAL DEFAULT ABS example.c
...
...
...
So we see that the symbol example.c was not stripped off. Please note that multiple -K options can be used in the same command.
Note: I am not sure why some other symbols were also not stripped off along with example.c in the example above. Any type of knowledge and suggestions are welcome on this.

6. Strip off a particular symbol using -N option

In a scenario where only a particular symbol is to be stripped off, just supply the symbol name along with the -N option.
Consider the example below :
$ strip -Nexample.c example
The above command should have stripped off the symbol example.c from the executable.
Confirming it using readelf :
$ readelf -s example

Symbol table '.dynsym' contains 4 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND printf@GLIBC_2.2.5 (2)
     2: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND __gmon_start__
     3: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __libc_start_main@GLIBC_2.2.5 (2)

Symbol table '.symtab' contains 68 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 0000000000400238     0 SECTION LOCAL  DEFAULT    1
     2: 0000000000400254     0 SECTION LOCAL  DEFAULT    2
     3: 0000000000400274     0 SECTION LOCAL  DEFAULT    3
     4: 0000000000400298     0 SECTION LOCAL  DEFAULT    4
     5: 00000000004002c0     0 SECTION LOCAL  DEFAULT    5
     6: 00000000004002e0     0 SECTION LOCAL  DEFAULT    6
     7: 0000000000400340     0 SECTION LOCAL  DEFAULT    7
     8: 0000000000400380     0 SECTION LOCAL  DEFAULT    8
     9: 0000000000400388     0 SECTION LOCAL  DEFAULT    9
    10: 00000000004003a8     0 SECTION LOCAL  DEFAULT   10
    11: 00000000004003c0     0 SECTION LOCAL  DEFAULT   11
    12: 00000000004003f0     0 SECTION LOCAL  DEFAULT   12
    13: 0000000000400408     0 SECTION LOCAL  DEFAULT   13
    14: 0000000000400440     0 SECTION LOCAL  DEFAULT   14
    15: 0000000000400668     0 SECTION LOCAL  DEFAULT   15
    16: 0000000000400678     0 SECTION LOCAL  DEFAULT   16
    17: 0000000000400688     0 SECTION LOCAL  DEFAULT   17
    18: 00000000004006b8     0 SECTION LOCAL  DEFAULT   18
    19: 0000000000600e18     0 SECTION LOCAL  DEFAULT   19
    20: 0000000000600e28     0 SECTION LOCAL  DEFAULT   20
    21: 0000000000600e38     0 SECTION LOCAL  DEFAULT   21
    22: 0000000000600e40     0 SECTION LOCAL  DEFAULT   22
    23: 0000000000600fe0     0 SECTION LOCAL  DEFAULT   23
    24: 0000000000600fe8     0 SECTION LOCAL  DEFAULT   24
    25: 0000000000601010     0 SECTION LOCAL  DEFAULT   25
    26: 0000000000601028     0 SECTION LOCAL  DEFAULT   26
    27: 0000000000000000     0 SECTION LOCAL  DEFAULT   27
    28: 000000000040046c     0 FUNC    LOCAL  DEFAULT   14 call_gmon_start
    29: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS crtstuff.c
    30: 0000000000600e18     0 OBJECT  LOCAL  DEFAULT   19 __CTOR_LIST__
    31: 0000000000600e28     0 OBJECT  LOCAL  DEFAULT   20 __DTOR_LIST__
    32: 0000000000600e38     0 OBJECT  LOCAL  DEFAULT   21 __JCR_LIST__
    33: 0000000000400490     0 FUNC    LOCAL  DEFAULT   14 __do_global_dtors_aux
    34: 0000000000601028     1 OBJECT  LOCAL  DEFAULT   26 completed.7382
    35: 0000000000601030     8 OBJECT  LOCAL  DEFAULT   26 dtor_idx.7384
    36: 0000000000400500     0 FUNC    LOCAL  DEFAULT   14 frame_dummy
    37: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS crtstuff.c
    38: 0000000000600e20     0 OBJECT  LOCAL  DEFAULT   19 __CTOR_END__
    39: 0000000000400750     0 OBJECT  LOCAL  DEFAULT   18 __FRAME_END__
    40: 0000000000600e38     0 OBJECT  LOCAL  DEFAULT   21 __JCR_END__
    41: 0000000000400630     0 FUNC    LOCAL  DEFAULT   14 __do_global_ctors_aux
    42: 0000000000601020     4 OBJECT  LOCAL  DEFAULT   25 i
    43: 0000000000601038     4 OBJECT  LOCAL  DEFAULT   26 local.2047
    44: 0000000000600fe8     0 OBJECT  LOCAL  HIDDEN   24 _GLOBAL_OFFSET_TABLE_
    45: 0000000000600e14     0 NOTYPE  LOCAL  HIDDEN   19 __init_array_end
    46: 0000000000600e14     0 NOTYPE  LOCAL  HIDDEN   19 __init_array_start
    47: 0000000000600e40     0 OBJECT  LOCAL  HIDDEN   22 _DYNAMIC
    48: 0000000000601010     0 NOTYPE  WEAK   DEFAULT   25 data_start
    49: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND printf@@GLIBC_2.2.5
    50: 0000000000400590     2 FUNC    GLOBAL DEFAULT   14 __libc_csu_fini
    51: 0000000000400440     0 FUNC    GLOBAL DEFAULT   14 _start
    52: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND __gmon_start__
    53: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND _Jv_RegisterClasses
    54: 0000000000400668     0 FUNC    GLOBAL DEFAULT   15 _fini
    55: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __libc_start_main@@GLIBC_
    56: 0000000000601024     4 OBJECT  GLOBAL DEFAULT   25 global
    57: 0000000000400678     4 OBJECT  GLOBAL DEFAULT   16 _IO_stdin_used
    58: 0000000000601010     0 NOTYPE  GLOBAL DEFAULT   25 __data_start
    59: 0000000000601018     0 OBJECT  GLOBAL HIDDEN   25 __dso_handle
    60: 0000000000600e30     0 OBJECT  GLOBAL HIDDEN   20 __DTOR_END__
    61: 00000000004005a0   137 FUNC    GLOBAL DEFAULT   14 __libc_csu_init
    62: 0000000000601028     0 NOTYPE  GLOBAL DEFAULT  ABS __bss_start
    63: 0000000000601040     0 NOTYPE  GLOBAL DEFAULT  ABS _end
    64: 0000000000601028     0 NOTYPE  GLOBAL DEFAULT  ABS _edata
    65: 0000000000400524    27 FUNC    GLOBAL DEFAULT   14 inc_func
    66: 000000000040053f    67 FUNC    GLOBAL DEFAULT   14 main
    67: 00000000004003f0     0 FUNC    GLOBAL DEFAULT   12 _init
So the absence of example.c symbol in the above output confirms that it was stripped off.

7. Create a new stripped off file using -o option

By default the strip command replaces the existing executable or object file with the stripped off version of the same. But, in case there is a requirement that the stripped file should not replace the original one then that can be done by supplying the name of the new file along with the -o option.
Consider the following example:
$ strip -s -ostripped_example example
$ ls -lart stripped_example
-rwxr-xr-x 1 himanshu family 6304 2012-08-22 21:49 stripped_example
So we see that the new file ‘stripped_example’ was created.

8. Preserve the access and modification date/time using -p option

In a scenario where the modification and access dates/time are to be preserved in the stripped off file, the option -p is used.
Consider the following example :
Lets first check the access and modification time of the original file using stat command:
$ stat example
  File: `example'
  Size: 8634       Blocks: 24         IO Block: 4096   regular file
Device: 805h/2053d Inode: 1443986     Links: 1
Access: (0755/-rwxr-xr-x)  Uid: ( 1000/himanshu)   Gid: ( 1001/  family)
Access: 2012-08-22 21:54:28.393778010 +0530
Modify: 2012-08-22 21:54:28.393778010 +0530
Change: 2012-08-22 21:54:28.393778010 +0530
Now, we strip the file :
$ strip -s -p example
Now, check the access and modification time again :
$ stat example
  File: `example'
  Size: 6304       Blocks: 16         IO Block: 4096   regular file
Device: 805h/2053d Inode: 1447364     Links: 1
Access: (0755/-rwxr-xr-x)  Uid: ( 1000/himanshu)   Gid: ( 1001/  family)
Access: 2012-08-22 21:54:28.000000000 +0530
Modify: 2012-08-22 21:54:28.000000000 +0530
Change: 2012-08-22 21:54:38.033844203 +0530
So we see that the access and modification time were preserved up to the seconds level.

9. Read command line options from file using the @file option

Consider the following example :
$ echo "-s example"
-s example

$ echo "-s example" > options.txt

$ cat options.txt
-s example

$ strip @options.txt
So the above output indicates that the strip command accepted the output from the file options.txt.

10. Get verbose output using -v option

If detailed information needs to be seen regarding what’s going behind the scene (when strip command works), the -v option can be used.
Consider the following example :
$ strip -v example a.out bufferoverflow
copy from `example' [elf64-x86-64] to `stiBqF4K' [elf64-x86-64]
copy from `a.out' [elf64-x86-64] to `stN5L0lp' [elf64-x86-64]
copy from `bufferoverflow' [elf64-x86-64] to `stYVKfE3' [elf64-x86-64]
So we see that information on intermediate steps was produced in the output when strip command was asked to strip down three executables.

Reverse Engineering Tools in Linux – strings, nm, ltrace, strace, LD_PRELOAD

5:44:00 PM 0 Comments


This article explains about the tools and commands that can be used to reverse engineer an executable in a Linux environment.
Reverse engineering is the act of figuring out what a software does, to which there is no source code available. Reverse engineering may not give you the exact details of the software. But you can understand fairly well about how a software was implemented.
The reverse engineering involves the following three basic steps:
  1. Gathering the Info
  2. Determining Program behavior
  3. Intercepting the library calls

I. Gathering the Info

The first step is to gather the information about the target program and what is does. For our example, we will take the ‘who’ command. ‘who’ command prints the list of currently logged in users.

1. Strings Command

Strings is a command which print the strings of printable characters in files. So now let’s use this against our target (who) command.
# strings /usr/bin/who
Some of the important strings are,
users=%lu
EXIT
COMMENT
IDLE
TIME
LINE
NAME
/dev/
/var/log/wtmp
/var/run/utmp
/usr/share/locale
Michael Stone
David MacKenzie
Joseph Arceneaux
From the about output, we can know that ‘who’ is using some 3 files (/var/log/wtmp, /var/log/utmp, /usr/share/locale).

2. nm Command

nm command, is used to list the symbols from the target program. By using nm, we can get to know the local and library functions and also the global variables used. nm cannot work on a program which is striped using ‘strip’ command.
Note: By default ‘who’ command is stripped. For this example, I compiled the ‘who’ command once again.
# nm /usr/bin/who
This will list the following:
08049110 t print_line
08049320 t time_string
08049390 t print_user
08049820 t make_id_equals_comment
080498b0 t who
0804a170 T usage
0804a4e0 T main
0804a900 T set_program_name
08051ddc b need_runlevel
08051ddd b need_users
08051dde b my_line_only
08051de0 b time_format
08051de4 b time_format_width
08051de8 B program_name
08051d24 D Version
08051d28 D exit_failure
In the above output:
  • t|T – The symbol is present in the .text code section
  • b|B – The symbol is in UN-initialized .data section
  • D|d – The symbol is in Initialized .data section.
The Capital or Small letter, determines whether the symbol is local or global.
From the about output, we can know the following,
  • It has the global function (main,set_program_name,usage,etc..)
  • It has some local functions (print_user,time_string etc..)
  • It has global initialized variables (Version,exit_failure)
  • It has the UN-initialized variables (time_format, time_format_width, etc..)
Sometimes, by using the function names we can guess what the functions will do.
The other commands that can be used to get information are

II. Determining Program Behavior

3. ltrace Command

It traces the calls to the library function. It executes the program in that process.
# ltrace /usr/bin/who
The output is shown below.
utmpxname(0x8050c6c, 0xb77068f8, 0, 0xbfc5cdc0, 0xbfc5cd78)          = 0
setutxent(0x8050c6c, 0xb77068f8, 0, 0xbfc5cdc0, 0xbfc5cd78)          = 1
getutxent(0x8050c6c, 0xb77068f8, 0, 0xbfc5cdc0, 0xbfc5cd78)          = 0x9ed5860
realloc(NULL, 384)                                                   = 0x09ed59e8
getutxent(0, 384, 0, 0xbfc5cdc0, 0xbfc5cd78)                         = 0x9ed5860
realloc(0x09ed59e8, 768)                                             = 0x09ed59e8
getutxent(0x9ed59e8, 768, 0, 0xbfc5cdc0, 0xbfc5cd78)                 = 0x9ed5860
realloc(0x09ed59e8, 1152)                                            = 0x09ed59e8
getutxent(0x9ed59e8, 1152, 0, 0xbfc5cdc0, 0xbfc5cd78)                = 0x9ed5860
realloc(0x09ed59e8, 1920)                                            = 0x09ed59e8
getutxent(0x9ed59e8, 1920, 0, 0xbfc5cdc0, 0xbfc5cd78)                = 0x9ed5860
getutxent(0x9ed59e8, 1920, 0, 0xbfc5cdc0, 0xbfc5cd78)                = 0x9ed5860
realloc(0x09ed59e8, 3072)                                            = 0x09ed59e8
getutxent(0x9ed59e8, 3072, 0, 0xbfc5cdc0, 0xbfc5cd78)                = 0x9ed5860
getutxent(0x9ed59e8, 3072, 0, 0xbfc5cdc0, 0xbfc5cd78)                = 0x9ed5860
getutxent(0x9ed59e8, 3072, 0, 0xbfc5cdc0, 0xbfc5cd78)
You can observe that there is a set of calls to getutxent and its family of library function. You can also note that ltrace gives the results in the order the functions are called in the program.
Now we know that ‘who’ command works by calling the getutxent and its family of function to get the logged in users.

4. strace Command

strace command is used to trace the system calls made by the program. If a program is not using any library function, and it uses only system calls, then using plain ltrace, we cannot trace the program execution.
# strace /usr/bin/who
[b76e7424] brk(0x887d000)               = 0x887d000
[b76e7424] access("/var/run/utmpx", F_OK) = -1 ENOENT (No such file or directory)
[b76e7424] open("/var/run/utmp", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 3
.
.
.
[b76e7424] fcntl64(3, F_SETLKW, {type=F_RDLCK, whence=SEEK_SET, start=0, len=0}) = 0
[b76e7424] read(3, "\10\325"..., 384) = 384
[b76e7424] fcntl64(3, F_SETLKW, {type=F_UNLCK, whence=SEEK_SET, start=0, len=0}) = 0
You can observe that whenever malloc function is called, it calls brk() system call. The getutxent library function actually calls the ‘open’ system call to open ‘/var/run/utmp’ and it put’s a read lock and read the contents then release the locks.
Now we confirmed that who command read the utmp file to display the output.
Both ‘strace’ and ‘ltrace’ has a set of good options which can be used.
  • -p pid – Attaches to the specified pid. Useful if the program is already running and you want to know its behavior.
  • -n 2 – Indent each nested call by 2 spaces.
  • -f – Follow fork

III. Intercepting the library calls

5. LD_PRELOAD & LD_LIBRARY_PATH

LD_PRELOAD allows us to add a library to a particular execution of the program. The function in this library will overwrite the actual library function.
Note: We can’t use this with programs set with ‘suid’ bit.
Let’s take the following program.
#include 
int main() {
  char str1[]="TGS";
  char str2[]="tgs";
  if(strcmp(str1,str2)) {
    printf("String are not matched\n");
  }
  else {
    printf("Strings are matched\n");
  }
}
Compile and execute the program.
# cc -o my_prg my_prg.c
# ./my_prg
It will print “Strings are not matched”.
Now we will write our own library and we will see how we can intercept the library function.
#include 
int strcmp(const char *s1, const char *s2) {
  // Always return 0.
  return 0;
}
Compile and set the LD_LIBRARY_PATH variable to current directory.
# cc -o mylibrary.so -shared library.c -ldl
# LD_LIBRARY_PATH=./:$LD_LIBRARY_PATH
Now a file named ‘library.so’ will be created.
Set the LD_PRELOAD variable to this file and execute the string comparison program.
# LD_PRELOAD=mylibrary.so ./my_prg
Now it will print ‘Strings are matched’ because it uses our version of strcmp function.
Note: If you want to intercept any library function, then your own library function should have the same prototype as the original library function.
We have just covered the very basic things needed to reverse engineer a program.
For those who would like to take next step in reverse engineering, understanding the ELF file format and Assembly Language Program will help to a greater extent.

15 Linux Split and Join Command Examples to Manage Large Files

11:05:00 AM 0 Comments


Linux split and join commands are very helpful when you are manipulating large files. This article explains how to use Linux split and join command with descriptive examples.
Join and split command syntax:
join [OPTION]… FILE1 FILE2
split [OPTION]… [INPUT [PREFIX]]

Linux Split Command Examples

1. Basic Split Example

Here is a basic example of split command.
$ split split.zip 

$ ls
split.zip  xab  xad  xaf  xah  xaj  xal  xan  xap  xar  xat  xav  xax  xaz  xbb  xbd  xbf  xbh  xbj  xbl  xbn
xaa        xac  xae  xag  xai  xak  xam  xao  xaq  xas  xau  xaw  xay  xba  xbc  xbe  xbg  xbi  xbk  xbm  xbo
So we see that the file split.zip was split into smaller files with x** as file names. Where ** is the two character suffix that is added by default. Also, by default each x** file would contain 1000 lines.
$ wc -l *
   40947 split.zip
    1000 xaa
    1000 xab
    1000 xac
    1000 xad
    1000 xae
    1000 xaf
    1000 xag
    1000 xah
    1000 xai
...
...
...
So the output above confirms that by default each x** file contains 1000 lines.

2.Change the Suffix Length using -a option

As discussed in example 1 above, the default suffix length is 2. But this can be changed by using -a option.
As you see in the following example, it is using suffix of length 5 on the split files.
$ split -a5 split.zip
$ ls
split.zip  xaaaac  xaaaaf  xaaaai  xaaaal  xaaaao  xaaaar  xaaaau  xaaaax  xaaaba  xaaabd  xaaabg  xaaabj  xaaabm
xaaaaa     xaaaad  xaaaag  xaaaaj  xaaaam  xaaaap  xaaaas  xaaaav  xaaaay  xaaabb  xaaabe  xaaabh  xaaabk  xaaabn
xaaaab     xaaaae  xaaaah  xaaaak  xaaaan  xaaaaq  xaaaat  xaaaaw  xaaaaz  xaaabc  xaaabf  xaaabi  xaaabl  xaaabo
Note: Earlier we also discussed about other file manipulation utilities – tac, rev, paste.

3.Customize Split File Size using -b option

Size of each output split file can be controlled using -b option.
In this example, the split files were created with a size of 200000 bytes.
$ split -b200000 split.zip 

$ ls -lart
total 21084
drwxrwxr-x 3 himanshu himanshu     4096 Sep 26 21:20 ..
-rw-rw-r-- 1 himanshu himanshu 10767315 Sep 26 21:21 split.zip
-rw-rw-r-- 1 himanshu himanshu   200000 Sep 26 21:35 xad
-rw-rw-r-- 1 himanshu himanshu   200000 Sep 26 21:35 xac
-rw-rw-r-- 1 himanshu himanshu   200000 Sep 26 21:35 xab
-rw-rw-r-- 1 himanshu himanshu   200000 Sep 26 21:35 xaa
-rw-rw-r-- 1 himanshu himanshu   200000 Sep 26 21:35 xah
-rw-rw-r-- 1 himanshu himanshu   200000 Sep 26 21:35 xag
-rw-rw-r-- 1 himanshu himanshu   200000 Sep 26 21:35 xaf
-rw-rw-r-- 1 himanshu himanshu   200000 Sep 26 21:35 xae
-rw-rw-r-- 1 himanshu himanshu   200000 Sep 26 21:35 xar
...
...
...

4. Create Split Files with Numeric Suffix using -d option

As seen in examples above, the output has the format of x** where ** are alphabets. You can change this to number using -d option.
Here is an example. This has numeric suffix on the split files.
$ split -d split.zip
$ ls
split.zip  x01  x03  x05  x07  x09  x11  x13  x15  x17  x19  x21  x23  x25  x27  x29  x31  x33  x35  x37  x39
x00        x02  x04  x06  x08  x10  x12  x14  x16  x18  x20  x22  x24  x26  x28  x30  x32  x34  x36  x38  x40

5. Customize the Number of Split Chunks using -C option

To get control over the number of chunks, use the -C option.
This example will create 50 chunks of split files.
$ split -n50 split.zip
$ ls
split.zip  xac  xaf  xai  xal  xao  xar  xau  xax  xba  xbd  xbg  xbj  xbm  xbp  xbs  xbv
xaa        xad  xag  xaj  xam  xap  xas  xav  xay  xbb  xbe  xbh  xbk  xbn  xbq  xbt  xbw
xab        xae  xah  xak  xan  xaq  xat  xaw  xaz  xbc  xbf  xbi  xbl  xbo  xbr  xbu  xbx

6. Avoid Zero Sized Chunks using -e option

While splitting a relatively small file in large number of chunks, its good to avoid zero sized chunks as they do not add any value. This can be done using -e option.
Here is an example:
$ split -n50 testfile

$ ls -lart x*
-rw-rw-r-- 1 himanshu himanshu 0 Sep 26 21:55 xag
-rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:55 xaf
-rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:55 xae
-rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:55 xad
-rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:55 xac
-rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:55 xab
-rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:55 xaa
-rw-rw-r-- 1 himanshu himanshu 0 Sep 26 21:55 xbx
-rw-rw-r-- 1 himanshu himanshu 0 Sep 26 21:55 xbw
-rw-rw-r-- 1 himanshu himanshu 0 Sep 26 21:55 xbv
...
...
...
So we see that lots of zero size chunks were produced in the above output. Now, lets use -e option and see the results:
$ split -n50 -e testfile
$ ls
split.zip  testfile  xaa  xab  xac  xad  xae  xaf

$ ls -lart x*
-rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:57 xaf
-rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:57 xae
-rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:57 xad
-rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:57 xac
-rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:57 xab
-rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:57 xaa
So we see that no zero sized chunk was produced in the above output.

7. Customize Number of Lines using -l option

Number of lines per output split file can be customized using the -l option.
As seen in the example below, split files are created with 20000 lines.
$ split -l20000 split.zip

$ ls
split.zip  testfile  xaa  xab  xac

$ wc -l x*
   20000 xaa
   20000 xab
     947 xac
   40947 total

Get Detailed Information using –verbose option

To get a diagnostic message each time a new split file is opened, use –verbose option as shown below.
$ split -l20000 --verbose split.zip
creating file `xaa'
creating file `xab'
creating file `xac'

Linux Join Command Examples

8. Basic Join Example

Join command works on first field of the two files (supplied as input) by matching the first fields.
Here is an example :
$ cat testfile1
1 India
2 US
3 Ireland
4 UK
5 Canada

$ cat testfile2
1 NewDelhi
2 Washington
3 Dublin
4 London
5 Toronto

$ join testfile1 testfile2
1 India NewDelhi
2 US Washington
3 Ireland Dublin
4 UK London
5 Canada Toronto
So we see that a file containing countries was joined with another file containing capitals on the basis of first field.

9. Join works on Sorted List

If any of the two files supplied to join command is not sorted then it shows up a warning in output and that particular entry is not joined.
In this example, since the input file is not sorted, it will display a warning/error message.
$ cat testfile1
1 India
2 US
3 Ireland
5 Canada
4 UK

$ cat testfile2
1 NewDelhi
2 Washington
3 Dublin
4 London
5 Toronto

$ join testfile1 testfile2
1 India NewDelhi
2 US Washington
3 Ireland Dublin
join: testfile1:5: is not sorted: 4 UK
5 Canada Toronto

10. Ignore Case using -i option

When comparing fields, the difference in case can be ignored using -i option as shown below.
$ cat testfile1
a India
b US
c Ireland
d UK
e Canada

$ cat testfile2
a NewDelhi
B Washington
c Dublin
d London
e Toronto

$ join testfile1 testfile2
a India NewDelhi
c Ireland Dublin
d UK London
e Canada Toronto

$ join -i testfile1 testfile2
a India NewDelhi
b US Washington
c Ireland Dublin
d UK London
e Canada Toronto

11. Verify that Input is Sorted using –check-order option

Here is an example. Since testfile1 was unsorted towards the end so an error was produced in the output.
$ cat testfile1
a India
b US
c Ireland
d UK
f Australia
e Canada

$ cat testfile2
a NewDelhi
b Washington
c Dublin
d London
e Toronto

$ join --check-order testfile1 testfile2
a India NewDelhi
b US Washington
c Ireland Dublin
d UK London
join: testfile1:6: is not sorted: e Canada

12. Do not Check the Sortness using –nocheck-order option

This is the opposite of the previous example. No check for sortness is done in this example, and it will not display any error message.
$ join --nocheck-order testfile1 testfile2
a India NewDelhi
b US Washington
c Ireland Dublin
d UK London

13. Print Unpairable Lines using -a option

If both the input files cannot be mapped one to one then through -a[FILENUM] option we can have those lines that cannot be paired while comparing. FILENUM is the file number (1 or 2).
In the following example, we see that using -a1 produced the last line in testfile1 (marked as bold below) which had no pair in testfile2.
$ cat testfile1
a India
b US
c Ireland
d UK
e Canada
f Australia

$ cat testfile2
a NewDelhi
b Washington
c Dublin
d London
e Toronto

$ join testfile1 testfile2
a India NewDelhi
b US Washington
c Ireland Dublin
d UK London
e Canada Toronto

$ join -a1 testfile1 testfile2
a India NewDelhi
b US Washington
c Ireland Dublin
d UK London
e Canada Toronto
f Australia

14. Print Only Unpaired Lines using -v option

In the above example both paired and unpaired lines were produced in the output. But, if only unpaired output is desired then use -v option as shown below.
$ join -v1 testfile1 testfile2
f Australia

15. Join Based on Different Columns from Both Files using -1 and -2 option

By default the first columns in both the files is used for comparing before joining. You can change this behavior using -1 and -2 option.
In the following example, the first column of testfile1 was compared with the second column of testfile2 to produce the join command output.
$ cat testfile1
a India
b US
c Ireland
d UK
e Canada

$ cat testfile2
NewDelhi a
Washington b
Dublin c
London d
Toronto e

$ join -1 1 -2 2 testfile1 testfile2
a India NewDelhi
b US Washington
c Ireland Dublin
d UK London
e Canada Toronto