5.1 使用场景: 如果需要对2个文件的数据实现像数据库里面的全连接一样的操作就可以用以下经验、命令实现。 这里的全连接是指对2个文件的数据按照某一列
来做并集的操作,也就是两个文件会指定一个公有的列(当然这个列的数据可以有相同也可以有不同),然后求这个列的并集以及能同时输出其他列的值。如果某个文件的某一列
(公有列除外)不存在于另一个文件的这一列,那么我们就要变量替换掉这个“不存在”的值。就像数据库会帮你替换成NULL值一样。
5.2 上代码例子!
[jianpx@jianpxs-MacBook-Pro ~/temp/test/join]$ cat a
a,1
b,2
c,3
d,4
[jianpx@jianpxs-MacBook-Pro ~/temp/test/join]$ cat b
c,7
d,9
e,10
a、b两个文件我默认已经按照第一列排好序了(我们利用join命令时必须保证两个文件是按照某一个公有的列有序的!)
然后我的需求是要输出类似这样的:
a,1,NULL
b,2,NULL
c,3,7
d,4,9
e,NULL,10
从结果可以分析看看, 因为e开头的这一行不存在与a文件,所以连接的时候对应第二列的位置就应该被置为NULL(当然你也可以用其他默认值代替NULL,比数据库要灵活一点)
而a开头的这一样也不存在于b文件,所以第三列的位置也应该是NULL。
从 man join得到的参数来看, -a 、-e、-o都能帮到我。
-a field_number : In addition to the default output, produce a line for each unpairable line in file file_number. 意思是把两个文件中不匹配的行也输出出来。
-e string: Replace empty output fields with string 意思是指定需要变量替换的值。
-o list: The -o option specifies the fields that will be output from each file for each line with matching join fields. Each element of list has the either the
form `file_number.field', where file_number is a file number and field is a field number, or the form `0' (zero), representing the join field. The ele-
ments of list must be either comma (`,') or whitespace separated. (The latter requires quoting to protect it from the shell, or, a simpler approach is
to use multiple -o options.) 其实这里对我最有用的就是0可以代表公有的列的位置,这样就不用用一个明确的位置(比如1)来指定这个公共列了,因为
用明确位置指定的缺点是这个公有列如果在两个文件上的位置是不一样就会有被替换的可能,比如在文件a是第一列,在文件2是第二列。
不知道大家现在能才出来没?(可以先不看下面的答案自己思考下)
好, 公布答案了, 看看你们做的对不对。
[jianpx@jianpxs-MacBook-Pro ~/temp/test/join]$ join -t, -j1 -o 0,1.2,2.2 -e NULL -a 1 -a 2 a b
a,1,NULL
b,2,NULL
c,3,7
d,4,9
e,NULL,10
分享到:
相关推荐
How to Implement ObjectOriented Programming Principles in MATLAB
This application shows you how to implement professional looking scrolling credits. There are two different demos: top to bottom and left to right. You can click on the credited individual to send him...
This application demonstrates how to implement a splitter bar in a Visual Basic Project.
SAP 方法论 Run SAP Methodology How to Implement End-to-End Solution Operations
在你的程序中实现打印支持Learn how to implement print support in your applications.
信息安全_数据安全_How to Implement a World Class I 云数据库 区块链 解决方案 安全攻击 网络安全
IMPLEMENT EVENTS IN TABLE MAINTENANCE
在win7中安装wince 6.0,这是正确的方法
records to support continuity of care. It is not anymore acceptable that, in the age of information technology, millions of Euros are wasted for unnecessarily repeated tests or that patients do not ...
Planing to Implement Service Management
A tool to generate class files to implement stored procedures创建类文件执行存储过程的工具
How to Develop and Implement a Security Master Plan
Whether it’s a section on lesser-known C# operators, how to sort strings that contain numbers in them, or how to implement Undo, this book contains recipes that are useful in a wide variety of ...
•How to Setup the plugin on your website. •How to use only the Basic plugin (minimal setup guide). •Security considerations. •The plugin API. •List of all available Options - including events and...
on the other hand, join operation in distributed database systems was never an easy task because data lo- cation and skewness makes join strategies harder to opti- mize. Fragment-replicate join (map ...
Teradata Cookbook Over 85 recipes to implement efficient data warehousing solutions 英文epub 本资源转载自网络,如有侵权,请联系上传者或csdn删除 查看此书详细信息请在美国亚马逊官网搜索此书
rsa using java to implement and we can encrypt and decrypt files.
Moving on, you will understand and explore the NoSQL paradigm and implement it using one of the most popular NoSQL databases, MongoDB, with some awesome libraries to make the experience very ...
Machine Learning: Step-by-Step Guide To Implement Machine Learning Algorithms with Python By 作者: Rudolph Russell ISBN-10 书号: 1719528403 ISBN-13 书号: 9781719528405 出版日期: 2018-05-22 pages ...
firstly, we focus on detailed coverage of deep learning (DL) and transfer learning, comparing and contrasting the two with easy-to-follow concepts and examples. The second area of focus is real-world...